Raiders of the Lost Arcade

Dr Bob's blog about modern videogames , retrogaming and Irish gaming in general.

Friday, May 13, 2005

Defective work

One of the things you might not realise about working as a general-IT-in-house-helper-monkey-and-all-round-tech-handyman (official title “onsite engineer”), is that about 50% of the job involves some form of detective work.

These skills usually have very little to do with anything you’d read in an MCSE brain dump and more to so with something you’d read by Sir Arthur Conan-Doyle This kind of work is something I wasn’t too bad at to start with, and also something that I learnt in Dell who, as much as may slag them off did train us very well .One of the things we were taught was how to logically solve problems, going through the evidence and ruling out the impossible, running through how the problem occurred in our minds, and once we had that sussed calculating a fix based on what had occurred , in a way a lot like your average police procedural novel works ...(and I bet you believed your mate who said we just a had a big flowchart of answers onscreen telling us what to say..well a big ptttp!! to you and to him as well:) )
officer_bob.bmp

In fairness most of it is straightforward enough ;
E.G. The user's machine keeps rebooting at random, you notice she swinging her legs on the chair as she tells you about the problem, looking under the desk , you can see a stray power cable connected to a scuffed looking power socket, she's been hitting off this intermittently and the pc's rebooting itself as its losing power.
Sometimes its not a user problem; a machine keeps flaking out and blue screening as well as other fun activities, you look back through the eventviewer , and notice a re-occurring arcane error, by tracking the description and ID number via a few online bulletin boards , you find a deciphered explanation relating to memory errors, you track back to roughly around the time the errors start occurring in the jobs list, and sure enough there’s been a memory upgrade ordered and installed ;swap the memory out and bingo the problems gone,(and dodgy memory is now gone back to the supplier)
Unfortunately sometimes the problems stop being the satisfying “Scooby Doo” style A to B to C style jobs , and start to resemble MENSA entry style , locked door mysteries .
Example machine X, Y and Z have intermittent lockups , traced back to an excel problem, machine x and z run one OS, y another , z was created from the same ghost image as x, but another machine ,lets say machine W, was too and runs fine , X,and Y are on one floor , Z is on another...all three connect to the same switch but so does machine W which as mentioned before has never had a problem.
SO you search for a common factor for X,Y, and Z, one that W doesn’t share , and when you find it you base your fix on it. Some days you end up with 4-5 pages of densely packed notes filled with Venn diagrams and flow charts, when you find yourself noting whether the user is right or left handed, its time to take a step back.
Sometimes in these situations you find the common thread , give it a pull, and things click into place for you.

Sometimes this doesn’t happen, but you have a gut feeling as to what’s happening and you have an educated guess based on this that leads to a fix.

Sometimes none of the above happens and you end up reinstalling
, explaining to the User that between the 1000's of real parts of the hardware itself , and the 1000's of "virtual" parts of an operating system and software made up of system files , there’s almost an ecosystem of resources relying on each other, and if even one of those should fail, it could trigger an unfollowable chain of events that can eventually lead to a program falling over , and in those circumstances find a link from the problem to the cause is impossible,in plain english

shit happens :)

(and by the way, on some very rare days I wonder if all those demons excised in the Old Testament days have stopped possessing the local village idiot and started
working their way into Dell Optiplexes, and think about breaking out the holy water, thankfully for my sanity those days are far and few between)

All joking aside , there are circumstances where you genuinely do have to call a halt to trying to track down the problem, and start looking at a reinstall of software /rebuilding the machine etc,
In a perfect world a technician could keep going through every possible resource guide, swap out every piece of kit, check every possible file that could possibly have contributed to the fault etc, but its quite possible at that at that stage the user may have died of old age ,all your other users (and their decendants at this stage) are miffed taht you cant look at their problems, and you're using SETI levels of resources ,and if at the end of the day all the guy wants to do is print his documents , its not doing you or him any favours.

Still I reckon I know how the hard-bitten detective feels in the cop show when the sergeant shouts at him “you’re off the case!”

So when I tell a user that we need to do a format and reinstall, it’s the last possible course of action. When >your< IT guy tells you that , it may just mean its Monday , his team lost at the weekend , and he's hung over,
I don’t know, I can only speak for myself here....



(oh the lego bobs are made courtesy of the now legendary reasonably clever site )