Farewell Speech (Tech Issues)

Below is the soft copy of my farewell speech on tech issues.

Good evening everyone. Carry forward the legacy left by one of our seniors, I am going to narrate a short story.

There was a boy, by name OMI, who came to IIIT without having any background in computers. He used to spend all this time in workspace exploring Linux usually at the cost of missing his lectures and even meals. Inspired by the works done by Nirnimesh, SMR etc., over a period of two years, after struggling, only as he could have done, he climbed onto the post which he always wanted, the sysadmin of IIIT.

Today I would like to speak about a few topics. The first one among them is my experiences as a sysadmin.

Our network is not ready to take the ever increasing load. The old wifi infrastructure has already crossed its End Of Life period. Also, the network is not properly planned which increases the time required to solve the issues. For example, it took several days to fix the network loop problems in different research centers last year.
Though we use open source to cut down our IT budget, we don’t make the best use of it.
The process of getting things done is slow. For example, we have not been able to narrow down on buying a firewall since last six months.
Now coming to the people part of it, the server room staff is generally co-operative but the problem lies with the administration mostly because of the slow decision making.
Within the administration, there is lot of interference by the people who don’t have adequate background in related fields.
Most of the time server room staff do the work they should not like asset management. Why a person who spent all his life in exploring technology should count number of computers in your lab?

Now, I’ll talk about the changes I tried to bring about and problems I faced.
There has always been an invisible gap between students and server room server room administration. It wasn’t very easy to convince people to stop treating the other end as enemy.
It was my top priority to restore the faith of administration in students so that more students can be brought into administration process.
Different research centers manage their own servers which poses a great difficulty for the server room staff to assist them. Everybody having their own network eventually increases the security loopholes as well. We tried to reduce the decentralization. Research centers are not usually ready to give up the rights easily but slowly things are changing and we have already moved few servers back to server room.
Since our network is growing, the policies that have been in place for past few years are becoming obsolete. We are in process of revisiting the different policies.

I’ll conclude by mentioning few key points. We urgently need to renovate the existing infrastructure otherwise there will be chaos. There is a need to invest more funds in IT infrastructure as its the backbone of the institute and it should not be seen as a burden. Gear up the approval process. We should be more transparent with students while forming the new policies.

I would like to thank IIIT for giving the students an opportunity to be involved in system administration and I hope the future batches will take this up as a serious job.

Thank You!

 

The best new year gift

We just received a mail from IIIT-H help desk stating

New year special proxy server has been created for all users as gift from
IIIT Server Room staff. There are no moderations done on that proxy so all
websites are accessible. Even better there is no connection limit per IP
or per user on the new proxy.

Proxy is accessible at IP _SNIP_ on port _SNIP_. Please feel free to
download as much as you want from this new proxy server.

Merry Christmas and Happy New Year!

Regards,
IIIT Systems Help Desk

Probably the best gift I have ever received.

 

Condition for a getting a degree

So far the day has been really frustrating for me. I WONDER how people really manage to get a graduate/postgraduate degree when they don’t even know how to use a browser or how to look up ip address of a domain name or how to write an email ??? The institute should consider putting up a course in the final sem in which they’ll check the following

  • The student knows how to use a browser.
  • He/she knows how to bypass the proxy server for an IP range or domain names.
  • He/she knows how to find out the IP address of a domain name.
  • He/she knows what is https.
  • He/she knows that if ssh is blocked it means if he/she tries to access the server via ssh access will be denied. And server can only display text and it will not cry/shout “Access denied”.
  • He/she can find out the IP address of his/her own machine.

This post was written after all the frustration caused by the reason you know. Never mind 😐

 

A visit to Dr. Reddy’s Labs

Yesterday, a team for five (four people from administration and me) visited Dr. Reddy’s Labs in Hyderabad for a live demo of two security products Linkproof and DefensePro from Radware. DefensePro is a firewall kind of thing (for lay man) which protects your critical servers and Intranet from internal as well as external attacks. Linkproof is a load balancer for outbound as well as inbound traffic in case of multiple (<= 3) ISPs. Linkproof also manages traffic in case one of the ISP fails.

We reached their office at around 2:40PM. The office looked just like another office of some IT company. Fully loaded with sensors. You need not push/pull the door when you enter the office. And everything that an enterprise office should have.

A few minutes later, one of the person from Radware accompanied us to their technical wing called “Information Systems”. Their we had a demo of both the products on consoles. Demo went on for almost half an hour. After the demo we asked if we can see the actual hardware they have deployed. Then the admin in DRL (Dr. Reddy’s Labs) accompanied us to their DMZ (DeMilitarized Zone, its a common term in security industry. Denotes a place where all your critical servers are placed. A highly secured zone basically). I was amazed by the hardware that I saw. I just can’t express my feelings. Though we were not allowed to enter the rooms but everything was visible through the glasses. Rack mountable servers everywhere. Racks, racks, racks and more racks. Awesome experience. I wanted to stay there forever and keep looking at the beasts but …

We headed back to IIIT after that. And yesterday night, in my dreams I saw I own a server farm which had uncountable racks of servers. Racks everywhere πŸ™‚

PS : The above two products will be deployed soon in IIIT for testing purpose πŸ˜€

 

Three days with Fluctuating Internet

We had three days with totally fluctuating internet. The fluctuation was almost like a sine wave. Nobody could really figure out what went wrong and where the problem was.

NOTE : This post is not just another ‘masalla’ post. I am writing down the actual experience I had.

DAY 1 : August 26

All this started on August 26th sometime in the early morning hours when browsing speeds and the bandwidth usage touched the lowest levels in the last month. As I keep monitoring the bandwidth usage (bandwidth monitoring and download progress bars appeal me somehow for certain unknown reasons. I keep looking at progress bars when I download something. I just get lostΒ  in kind of dreamworld while looking at them.), I was surprised to see the low usage because everyone was in the campus and usage should touch the peak levels. It returned to normal after a short period of time and browsing was normal. But this pattern kept repeating itself. I went to attend the class. I returned at 11:30AM and rushed to server room to checkout whats going on. By that time server room was swamped by the phone calls from different research centers.

Nobody was actually able to figure out what was going on. All that we knew was that there was heavy broadcast from a segment on the network. We suspected it as the same problem which we faced last week. But isolating the problematic area is heck of a job and nobody was ready to check the network devices at the leaf level because of following reasons (1) It’ll take almost a day to check individual NIC in all the labs, (2) There is no security that problem will be resolved.

We took a tough decision of shutting down the network in entire problematic segment. This worked and network was fine. No fluctuations. But it proved out to be a wrong decision. We didn’t inform the people in the affected network (which unfortunately consisted of major research centers at IIIT i.e. CVIT, CDE, CVEST, LTRC (temp) etc.) and immediately we had to face the phone calls from HODs. One thing that I learnt from this situation is that Internet connectivity is equally important for everyone at IIIT including faculty members. Though we keep blaming students for being addicted to internet. Internet here is not an addiction, its a need. We had to re-up the network. And the rest of the network started fluctuating again. Everybody left for lunch.

As the time passed, the frustration among the users grew and everybody was almost shouting. Everybody wanted to know why its was taking so long to solve this problem. After lunch one of the admins went to the problematic area and started debugging at the individual switch level. But he faced a real tough time as most of the switches at leaf level are unmanageable (you can’t see any error reports unless you plug into individual switch). And we have a lot of switches (by a lot I mean a real lot of switches). And the switches are cascaded in such a dangerous manner that isolating a problem becomes way difficult. By evening that day we could isolate two research labs and three other segments which were generating heavy broadcast. We shut them off and everybody left for the day. There was a kind of blackout in those segments. No internet, no LAN.

During the night, I kept monitoring the network. A lot of people pinged and complained about the DNS resolution problem. Web pages were loading at a high speed but the name resolution was taking a lot of time. I tried looking at the logs and the traffic. Everything was fine except that the nameserver was swamped by the mail servers for name resolution. I tried a few hacks but nothing worked.

DAY 2 : August 27

I didn’t have any class that day. Admin XYZ called me at around 10:30AM and requested to come to server room if possible. I was sleeping and I hardly wake up at that time. But I didn’t want to miss the opportunity. Got up quickly and rushed to server room wasting as least time as possible. I was in server room at 11:00AM.

Admins suspected some problem with proxy as the fluctuation persisted even after cutting off the problematic areas. By the time I reached server, admins switched over to the stand by proxy machine. And to get started from Zero, entire network except the main building was shut down. We waited for almost half an hour. Everything worked absolutely fine. No fluctuations at all. So, main building is fine.

At around 11:40AM, network was restored in all the hostels. We waited for another half an hour. No fluctuation yet. But hell lot of phone calls sensitizing the situation. Everybody including seniors members rushing to server room. We suspected some attacks from hostels on the server in labs. But we were wrong. The problem is in the library building. But where?

Till lunch time, no network in areas except main building and hostels. As the time passed, the issue became more and more serious. It became difficult to answer phone calls from senior members as the word “Heavy Broadcast” now became irritating for them. They were listening to this since last two days.Β  But nobody actually knew the exact answer. The origin of the broadcast was still not known.

Admin XYZ rushed to the library switch. Now XYZ was in live contact with admin PQR in server room and restoring the network in research centers one by one. Restore network in one research center, wait for half an hour. If no fluctuation, proceed otherwise revert back. Using this technique (this was the only solution), we restored network in all the centers except two. Connections to these centers also cascade to other areas. Complete outage in the two research centers. Everybody left for the day, leaving the two research centers in dark.

Network stabilized a bit. And fluctuation was not frequent (almost none). I monitored the network up to 2AM. Didn’t sleep because had a class at 8:30AM.

DAY 3 : August 28

I had a class up to 10AM. Rushed directly to server room after the class. We already narrowed down to a smaller region. Now the problem was smaller and there were lesser number of people after us. Admin ABC with a student was sent down to inspect individual switches. Thats the problem with unmanageable switches. You have to go and check each and every switch for any error messages. Anyway we kept narrowing down the problematic area till lunch. I left for lunch and returned to my room as I didn’t sleep during previous night. I don’t know what happened in the afternoon. I missed that πŸ™ At 6:30PM, I called admin XYZ and asked about the status. He informed that the problem has been isolated. Only two very small labs were left.

Three days and problem was still there. People were really out of control. Anyway network worked perfectly in other areas except those two labs. The good thing was that these labs were at the leaf level and they were not cascading connections further.

DAY 4 : August 29

I had a lab from 10AM-11AM. But it went up to 11:45AM. By the time, I reached server room, the problem was already resolved. Everyone was connected and no more complaints. Rawat sir updated me with a few decisions which are beyond the scope of this post. The problem was the routing queries from one of the ISPs connected to those labs at leaf level.

It really took almost four days to debug this problem. Debugging a network, especially debugging a network which is randomly cascaded, has more than one entry points, has no perimeter and has a lot of unmanageable switch is a real challenge.

Anyways it was again a learning experience for me. I used to blame people for not able to solve the network problems quickly. I just realized that its very easy to blame.

PS : Longest post on the occasion of bloggers’ day πŸ™‚

 

Freaking 50 hours

All this started after returning from dinner at around 8:00PM on August 19.

August 19 – 08:30PM – 10:30PM : Slept.

I woke up at 10:30PM and started browsing random stuff. Gave a finishing touch to GSoC project – IntelligentMirror and announced the release for testing.

August 19 – 10:30PM – 12:30PM : Browsing + Blogging + Browsing.

After that its time for some refreshments.

August 20 – 12:30PM – 02:00AM : Snacks + Toasts.

Then started the same old thing. Browsing. Why the heck we have to browse all the stuff in this world which has no meaning 😐 Also, I had to prepare for my presentation about SMTP protocol in Topics of Information Security class. So, downloaded all the RFCs (thats what we do all the time. download all the academic stuff and feel good about it. Who has time to open and read them πŸ˜› ). Read a bit of stuff from the RFCs.

August 20 – 02:00AM – 07:00AM : Browsing + Reading RFCs + Browsing.

If you are up till 7AM, breakfast is worth a try. Had a breakfast which was good eventually.

August 20 – 07:30AM – 08:00AM : Breakfast.

After that, I though I would go to bed and will sleep for sometime. But then Internet is a real devil which will not let you sleep. Started browsing again and checking everyone’s status message. Reading blogs and some more blah blah. For god sake, don’t put links like this in your status messages.

August 20 – 08:30AM – 12:30PM : Browsing + Fantastic Contraption + Browsing

If you are up till 12:30PM, its good to have lunch before you go to bed πŸ˜€

August 20 – 12:30PM – 01:30PM : Lunch.

After lunch, I was desperate to sleep. But then I had this class called “Music Appreciation” at 5PM. I was afraid of losing an attendance and didn’t sleep. I read RFCs in the meantime. No browsing this time πŸ˜€

August 20 – 02:00PM – 04:30PM : Reading RFCs. No bc.

At around 4:45PM we rushed to coffee shop and then to class. After reaching the class, we realized that prof will not turn up and class stands canceled.

August 20 – 04:30PM – 05:20PM : Coffee shop + Class.

While returning from class, I just wanted to visit server room to see whats going on (actually the network was fluctuating really badly and just wanted to know what exactly is going on. I am a student sysadmin, so kinda concerned about these issues). But I came to know, some serious problem has occurred due to real heavy broadcast from few research labs. Sysadmins were zeroing on the problem. I suddenly forgot about sleeping and all. There may not be a better opportunity to learn how to configure a switch and how to debug a network problem. I was lost somewhere in the switches and servers and I realized it was 7:00 by the time admins sorted out the problem.

August 20 – 05:30PM – 07:00PM : Server room (real good experience. Learnt a bit about how to configure switches πŸ˜€ )

By the time, I left server room it was almost 7:30PM. So just rushed to Yuktahaar to have food. Had some good food. Yuktahaar is probably the only mess in the campus where you can actually eat something.

August 20 – 07:30PM – 08:00PM : Dinner.

As I had to give the presentation on SMTP the next morning, thought of reading some more stuff quickly. Read up to 9:00PM. At this time, I was *REAL* desperate to go to bed. But at about 9:19PM, Himank called up and told about bloggers meet. I was like WTF. Well I wanted to attend the meeting and rushed to main building. Meeting went upto 10:30PM. The meeting was real fun and it was a good experience to meet all the bloggers from my batch. We decided few things to promote blogging in IIIT. Find the details in the link in next line.

August 20 – 08:30PM – 10:30PM : Reading RFCs + Bloggers Meet.

After returning from the meeting, I though of finishing the RFCs and preparing the presentation. But RFCs was a lot more than I assumed, so reading went upto 1:30AM. Prepared the presentation.

August 20 – 10:30PM – 01:30AM : Reading RFCs + Preparing presentation.

I was about to sleep and suddenly Rishabh appeared and now there is something with proxy πŸ™ We discussed about the proxy and other misconfigurations for almost 1 hour.

August 21 – 01:30AM – 02:30AM : Discussion about proxy.

I went to bed at 2:30AM. Felt a bit relaxed. But couldn’t sleep. Why?? I was feeling hungry πŸ™ Get up at 2:45AM and rushed to Deepak’s room for having some snacks.

August 21 – 02:30AM – 03:00AM : Snacks + Toasts.

I was in my room at 3:00AM trying to analyze the situation. “If I go to bed now, I’ll not be able to wake up for 8:30AM class and the whole idea of preparing the presentation will be wasted. Should I sleep or not??” Went to bed but got up again in 15 minutes thinking that its impossible to wake up for the class(lack of confidence??). Browsed stuff for sometime and then had a look at the servers. And this time mailman was not working on the students server. Mails to students mailing list was not being delivered. Tried to debug that for almost 2 hours, but all in vain. Monitored servers for sometime and some more browsing. Please forgive me for sending all the test mails πŸ™‚

August 21 – 03:00AM – 06:30AM : Mailman debugging + Server monitoring + Browsing.

By 6:30AM, I was like “I’ll die if I don’t sleep. But if I sleep, I’ll miss the class. WTF :(( “. I finally decided to sleep. But scheduled alarms at full volume on my computer. Scheduled high beat songs(shell scripting rocks πŸ˜€ ). And decided to sleep on the chair itself as its difficult to get out of bed.

August 21 – 06:30AM – 07:45AM : Sleeping (on chair πŸ™ )

Woke up at 7:45AM and rushed to mess at around 8:00AM.

August 21 – 08:00AM – 08:20AM : Breakfast. OBH mess serves the worst breakfast you can have. They server chowmin in breakfast. I AM NOT KIDDING.

Now starts the actual hectic session. Continuous class upto 1:00PM. whoaaaa!!!!!

August 21 – 08:30AM – 10:00AM : Topics of Information Security Class.

August 21 – 10:00AM – 11:30AM : Systems and Network Security Class.

August 21 – 11:30AM – 01:00PM : Numerical Analysis. This class may be suicidal if you haven’t slept for almost 36hours. Caution next time.

Time for lunch.

August 21 – 01:00PM – 01:30PM : Lunch @ Yuktahaar.

I would have gone to room for sleeping now. But mailman is still not working and students are missing their mails πŸ™ Went back to server room. Did ad-hoc management to bypass mailman for temporary mail delivery. Fixed few other things.

August 21 – 01:30PM – 03:30PM : Server room.

Now we had a Infrastructure team meeting at 3:30PM. Was good. Discussed about a lot of issues and how do we replace old infrastructure in a systematic manner.

August 21 – 03:30PM – 04:30PM : Infrastructure Team Meeting.

As its almost 5PM again, time for three hours long “Music Appreciation” class.

August 21 – 04:30PM – 07:00PM : Coffee shop + Music Appreciation class.

Enough is enough. I *MUST* sleep now. Returned to room quickly ignoring the dinner and went to bed thinking that I’ll sleep for at least 14-16 hours (normally I sleep for 12-14hours at a stretch). But who knew that its not possible. I wake up at 12:30.

August 21 – 07:30PM – 12:30AM : Sleeping (in bed πŸ˜› )

Woke up at 12:30AM. And Internet is here again. Browsing. Browsing. Browsing. Browsing. Browsing. And blogging. BTW, a newer version of youtube cache is available now.

August 22 – 12:30AM – 01:30AM : Browsing + Blogging.

Was feeling real hungary. Rushed to Deepak’s room for snacks and toasts.

August 22 – 01:30AM – 02:30AM : Snacks + Toasts.

Browsing again. And I have been writing this god damn post since half an hour. Its almost 3:30AM now and I am thinking of going to bed again. Hopefully I’ll sleep well this time πŸ˜€

August 22 – 02:30AM – 03:30AM : Blogging.

Previous two days was really hectic. No? Well, I can’t really take more than that. I think my biological clocks are out of synch. Need a break.

PS1 : Change your status messages frequently. Don’t bore me.

PS2 : Need to blog more frequently. Have a lot of stuff to blog about. This sem is real happening.

PS3 : Have two courses on security this sem. Wish me luck πŸ˜€

 

vacations and few learnings

Just a quick wrap up of happenings in last month before I leave for home.

1. Do what you want and never care what others have to say about it. example -> http://fedora.co.in/

2. Don’t ever think that you are perfect in your field. Nobody is perfect.

3. No machine (computer) in this world is secure. Only the switched off machines are 100% secure.

4. Talk less, do more.

5. Don’t underestimate anyone. You can learn from everyone if you are willing to.

6. Be a rebellion.

7. Go home when you get the f**kin vacations.

8. Try not to get flamed and refrain from flaming others as well.

9. Don’t reply to PMs for assistant when you are not the concerned person.

10. Switch on the vacation responder when you go for holidays πŸ™‚

I think thats enough. Leaving for home. Will be away from computers/internet for almost 20days. See you if I manage to survive πŸ™‚