SDLC's Dirty Little Secret

Posted by: Daniel Nelson on

If you have never been in the datacenter in the wee hours of the morning, trying to figure out what's wrong with a server, stop reading this.  If you have never been in an all-hands-on-deck meeting where everyone is trying to figure out what went wrong with a deployment, then stop right now and go do something more useful with your time.  Because if you haven't lived through that, then the rest of this post won't mean much to you.

I've been through plenty of those times.  And the truth is no matter what company I worked at - big, small, or something in between - we always had those times.  They are just a fact of life for people whose job it is to keep the server up and running.

Which brings me to the point of this entry (which is naughty blogging of me - you are supposed to make your point in the first paragraph, but oh well.  It's my blog, so I get to do what I want).  Anyhoo - the point is that so much is wrong with how we think about the "Software Development Lifecycle."  In all the flow charts and diagrams I've seen they all seem to be ignoring the very basic point that things go wrong.  And they don't have a lot of built-in mechanisms to help you through that phase of the life-cycle.  It's a black box in the flow chart that says "insert troubleshooting here."

Let me back up a bit.  If nothing changes about a server environment, chances are you aren't going to have a lot of problems.  Maybe a power supply goes out.  Maybe the network drops.  But for the most part you are only going to have problems when things change.  And that's the catch 22.  Things are supposed to change.  New code is supposed to go out to the servers.  New features are supposed to make it into the hands of eager customers.  And those changes are supposed to happen more often rather than less frequently.  That's how our companies compete with each other - it's called time to market.

And that's the fundamental disconnect between the two goals of keeping things up and running, and getting new code out faster.  As diplomat Henry Kissinger used to say, "real tragedy doesn't happen when right faces wrong, but rather when two rights face each other".  Keeping the servers up is a good thing.  Getting code out faster is a good thing.  How do you balance the two?

To me, any real Software Development Life-cycle has to fully embrace the fundamental belief that things are going to go wrong during any process.  Phurnace Software sells a product that helps do things better, but I can't claim, and I wouldn't claim, that if you use Phurnace to do your deployments that you will never have a problem again.  That's ludicrous, and simply not credible to people who have done this as long as I have.

But one thing that is very different about Phurnace, and the way we do things, is that what we have built is functionality that is designed to specifically help you out when things do go wrong.  We know it's going to happen at some point, and we work hard to try to make it as infrequent as humanly possible.  But it will happen.  And when it does, we have the built-in the features you will need to figure that part out.  Heck, we even have a product named Troubleshoot. 

So, when you are looking at how to balance that "time to market" with "keep the servers up", keep in mind that you have to have robust tools and process in place to fix the problem.   A black box just doesn't cut it.


In troubleshootTipsjava

Comments (0)Add Comment

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

busy