5 Traits of an Excellent Troubleshooter

troubleshooterOne of my favorite shows on TV was House. This show was about a doctor (a diagnostician on the show) that was the goto guy for any problems no one else could solve. He was an asshole but everyone put up with it because he was a genius. He was somehow able to see patterns and connect the dots with every crazy medical problem his patients had. He didn’t care about the patient. He just cared about solving puzzles. He would have been just as happy being a physicist, chemist or engineer, I bet. It wasn’t about the vertical he was in; it was about diagnosing the problem and fixing it.

I found myself relating to House a lot. I love solving problems. I get great satisfaction from being given some obscure puzzle and solving it. I’m sure a lot of you relate somewhat to House as well. Let’s hope the love of puzzles and not necessarily the asshole part. 🙂 Perhaps it’s my logical, left-brained mentality that’s gotten me called Spock one too many times. I don’t know. All I know is that I think of myself as a pretty good troubleshooter. I seem to have a knack for breaking down big problems into manageable chunks and eventually putting the pieces together. Can you relate?

I’ve seen a lot of different people troubleshoot problems in a myriad of different ways. There’s no one way to troubleshoot even the exact same scenario because troubleshooting is more of an art than a science. In my experience, I’ve witnessed some people who seem to naturally “get it”. These people start from knowing nothing all the way up to fully understanding problem, implementing a solution that not only fix the problem now but ensures that it never happens again. These are the type of people who you can just throw a problem their way and know that it’ll be fixed regardless of what it is. They just seem to have a gift calling from previous experience, have enormous amounts of patience and can easily see patterns as to how the individual pieces fit together. It’s amazing and inspiring to work with these kinds of people.

On the flip side, I’ve also worked with people who, no matter how much experience they have, couldn’t troubleshoot their way out of a wet paper bag. They may have a particular expertise and even when faced with a common problem in their particular field it’s a struggle for them to figure out what went wrong. They are impatient and just want to get it fixed so they can move onto the next task which exacerbates the problem in the first place. The more impatient they get, the more sloppy they get thus the longer the problem takes to be solved. Additionally, the same problems may keep cropping up again and again and they continue to seem like they’re on a giant hamster wheel just doing the same thing over and over again.

I believe what separates these two kinds of people can be broken down into 5 distinct traits.

    1. ExperienceWearing your senior IT architect hat, it’s easy to ridicule the junior admin when he can’t fix a simple connectivity issue. Even if this junior admin is the brightest young admin you’ve ever seen, at a minimum, he’s going to take a whole lot longer to properly diagnose and fix a problem. The reason is obvious. He just doesn’t have the long-term knowledge to call upon when faced with a situation that’s never personally happened to him before. He not only doesn’t have first-hand experience troubleshooting the problem he may not even have experienced any problem remotely similar to this one before. He can’t even call upon his knowledge of a situation that looked like this before. The more experience you have, regardless if you consciously recall the event or not, is going to help you either remember the exact problem you’ve troubleshoot before or at least call upon a memory of something that looked familiar in the past. Ease up on Junior, OK?
    2. Ability to Recognize PatternsThis trait can really get deep. I don’t know if this is a learned trait, genetic or what. It’s a distinction I commonly see over and over again in great troubleshooters. It’s hard to put my finger on this so I’ll give you an example. Let’s say the help desk is lighting up because application XYZ is down. No one knows a thing about this application and your entire team is at pretty much the same experience level. Bill (the IT rockstar), figures out what server this application is on, ensures the server is up, talks to the users and narrows down a timeframe when they started noticing the problem, checks the logs, sees an entry saying something about loss of network connectivity during that time. He then goes over to the network engineer remembering that he saw some kind of email awhile back saying he was going to be upgrading something on the network. Bill explains the problem and the actions he’s taken to decipher the problem as best as he can to the network engineer. Bill discovers that the upgrade did go on during that time which blocked a certain TCP port and that it would, in fact, cause this. The network engineer reverts his change and all is well.Bob, on the other hand, knowing nothing about the application itself either immediately wants to call the application’s support line. He figured he knows nothing about the application and that the vendor would. Hours later of troubleshooting with the vendor, the vendor finally sees that the server lost connectivity and this what was causing the problem and hands the ball to Bob. Bob then does a ping to the server and sees that it’s responding. Bob is stumped. It’s up, so why would the application not be working? It’s because the application’s not using ICMP to communicate with another server. It’s using the TCP port the network engineer accidentally blocked.Do you notice the multiple places where pattern matching came into play? Bob just can’t put 2 and 2 together while Bill somehow is able to just connect the dots and recall on previous experience and previous communication.
    3. Long Term Vision

A good troubleshooter can diagnose and solve the immediate problem. The truly great troubleshooter not only solves the immediate problem but solves it in a way that it doesn’t happen again. For example, a server periodically stops receiving connections for some reason. It is known that in order to resolve the immediate issue the server can simply be rebooted and the server will work just fine for a few months until it happens again. Bob is the rebooter. When Bob finds out this server is having problems again he immediately reboots it to get things working again and goes back to his daily duties. Bill, on the other hand, faces the wrath of the users and possibly management and doesn’t immediately reboot it. Bill wants to investigate while the server is experiencing the problem and gather logs and do various tests while the server isn’t working. Bill could “fix” it by rebooting it immediately but knows that a little more downtime now will pay off in the long run.

A good troubleshooter with a long-term vision will find an immediate fix but won’t pull the trigger just yet. He will be patient and attempt to figure out what the root cause of the problem is and fix that rather than fixing the symptoms.

  1. Does not Assume They Know AnythingOne of the biggest traps I’ve felt into personally is assuming I know what the problem is ahead of time. I’ve got experience. I’ve seen everything, right? I immediately see the problem and think to myself I know exactly what’s wrong. It’s just up to me now to prove it to myself. This is the wrong way to approach troubleshooting a problem. If you think you know what the problem is already you’re going to be blind to any other potential cause of the problem. You might be right but you might be wrong. If you’re wrong and you think you’re right you’re going to waste tons of time trying to prove a wrong theory when all you had to do was go in not having any assumed problems and by just following the trail.
  2. PatienceWhile working on that latest support ticket do you find yourself thinking about other things? What about just having an anxious feeling when you’re an hour into troubleshooting? Are you constantly looking at the clock or even feeling guilty you’re spending too much time on this? Barring ADHD, this is probably due to your lack of patience. Lack of this trait can be worse than just about any other I’ve discussed. Why? Even if you’re a troubleshooting rockstar if you don’t have the patience to stick with it you’re either going to quit before you figure out the problem or you’re going to just implement some hack that’ll never last.

I hope the next time you’re neck-deep into some obscure problem you’ve never seen before that you think about some of these traits. If you’re Bob, realize it and do something about it. If you’re Bill, kudos to you and keep on doin’ what you’re doin’!

2 comments

  • I would also add:

    6. Switching it up.

    I’ve often found that if I am not making any traction by normal troubleshooting means (debug tools, logfiles, or if those fail, trying permutations until you can reproduce the issue on demand), then sometimes you need better tools.

    Sometimes it’s just a matter of trying a different debugging tool. Sometimes, it’s looking at a different type of logfile.

    Sometimes you have to build a better debug tool, or better logfile yourself.

    Learning when to call it quits on existing methods, and develop new methods is a great trait for troubleshooters to have.

    • Yep, I agree. That is another good one. I’ve even just walked away for a couple hours and came back and immediately think of something else to try also.

Leave a Reply