Thursday, December 17, 2009

[Agency] is Hosed and Everybody is a Victim

So I have this problem at work.  I am an IT consultant (of some sort) and operate in the interest of my organization.  IT for the entire state has been privatized, all IT personnel have been taken on by the contracting companies, and I am left as a state employee - the guy on the inside.  I now find myself in a defensive position - we fight constantly with the contractor about service levels, response times, help desk attitude, consistency in practices, communication, and worst of all - IT best practices.

I have this thing now that I might become known widely for.  Let me explain:

I handle issues for about 20 sites.  19 of those sites have their own server, which the workers log in to every morning.  It handles directory services and network storage.  Without this, well - I won't discuss client/server architecture vs. peer-to-peer networking here.  I'll just say that the server has to be up all the time.  Wouldn't you agree?  There is also a bunch of networking equipment, namely the Integrated Services Router (ISR) that handles DHCP and provides a gateway to the Internet and the virtual tunnel to the centrally-located state-operated services.  I can safely assume we'd want that to stay on all the time too.

These items get their power through an Uninterruptible Power Supply (UPS) so if the power blinks for three seconds or is out for five minutes, the server and network systems do not shut down or have to reset.  Zero network interruption - everything goes back to normal when the power comes back on.  Computers boot up and everything is available.

Our UPS devices are over five years old.  We all know from our iPod experiences that batteries don't last forever, and at least three of the 19 that are in place have gone bad.  The technician from the multi-billion-dollar company came out to look at it and said "Your UPS is bad."  The conversation went like this:

Me: "Okay.  When are you going to bring a new one?"
Tech: "Uh...  I don't know.  We don't know who's going to pay for it."
Me: "I'm sorry?"
Tech: "Yeah... There's some confusion on who is going to replace the UPS."
Me: "Why?"
Tech: "I don't know.  We plugged the server into a surge protector and it's back up.  Bye."

That was three months ago.  After asking several people and not getting a straight answer, only "We're looking in to it..." and "[this person] agrees that [the contract] should cover it", I get a simple reply from above: "[contracting agency] flat-out states that UPS devices are not covered by [project]."

Okay.  I will publicly make the assumption, given that statement, that our contracting agency does not think UPS devices are necessary.  They now own the infrastructure - the servers, LAN, WAN, client systems.  If they don't cover UPSs, then they don't feel that power protection is important to their systems.  I am told that my sites, with their limited budget, should buy the UPS for their server.  Problem is, it's not their server.  It's [contracting agency]'s.  Sure, it serves that office, but responsibility for uptime, service level agreements, security, and data backups of that server fall on the big (stupid) guys.  Yes, I said stupid.  They know not what they do.  When the server goes down, and it does so weekly, no one cares that the entire office is impacted.

From what I hear, one of the services on one of the servers requires human interaction to finish the boot/startup process.  I don't know this for sure because I'm never there when the power blinks, but if that is true, we have to have someone on-site to press the F1 key every time.  That brings up another good point - we have storms at night.  What else runs at night, you ask?  The backup.  What happens if the backup process is interrupted?  What happens if that power failure causes hardware failure?  What happens if no one knows it?  There is no monitoring service - I get calls from the office saying "There's a message on the server about backup somethingoranother..."  Shouldn't IT already know there's a problem?  Why does our infrastructure depend so heavily on our users?

Service is interrupted too often.  Sites are short-handed because there's a hiring freeze.  More people need benefits these days, so enrollment in our programs is up.  Efficiency in our work has never been more important, and [stupid agency] wants to bicker over a $500 device to keep the server up.  Every time the power goes out (keep in mind we just went through hurricane season) the office is interrupted for at least an hour.  I could go on and on about how we could virtualize the desktops and maintain our environment so much easier and operate more efficiently (and probably cheaper), but all I want today is the fucking server to stay on.  Is that too much to ask?

I'll keep asking, and I'll keep writing, and maybe one day I'll get what I want.  Then we might actually get what we need.
                

No comments: