Yesterday I was reminded, again, of one of my very old and very important rules: any company involved with development must maintain a wall between their development and production environments. A related important rule is that only releases that have been subjected to sufficient testing should be used to update production environments.
SAY IT AIN’T SO, DEAN!
How did an ‘old development management sea dog’ like me get stung by not complying with these rules? Well, truth be told, the reason lies with succumbing to two of the seven deadly sins!
Isn’t it funny how most of our human errors can be traced back to that simple list?
FEATURE GREED
My first sin this week was GREED–specifically, feature greed. Talking only about development related sins, of course! I wanted to show the potential client ‘the lastest and greatest’, ‘the best that we could be’, ‘the goodies’, you name it! My rationale was that there had not been many changes to the application executables in the prior week, so after carefully testing the changed functions and passing them I thought I was good to go with the new stuff.
In the haze of feature greed, I forgot about our old friends ripple effect and fault feedback ratio.
TINY LITTLE WAVES MAKING BIG PROBLEMS
Ripple effect, for those not in the business, is when you change something ‘over here’ in your code and it breaks something ‘over there’ in your code. It can be the fault of the developer not following through with a change to all affected areas or it can be the fault of the project manager not seeing all the modules that will be affected by a change and assigning all developers that own potentially affected code to do the necessary work to deal with the effects of the change.
In this particular case, I took a delivery from one developer that resolved some errors in one module, all of which tested good, then I took another delivery of fixes from another developer in a different module, all of which tested good. It was the combining of the deliveries that was the problem–fixed functionality in the one module made code in the other module (changed weeks ago and tested OK at that time), blow up on finding data again where it was expected but in a now unexpected format.
NOBODY IS PERFECT
The essence of humanity, none of us is perfect, is demonstrated through the fault feedback ratio. Nobody can go in and fix an issue without ever breaking anything else. Developers that are very careful will have an extremely low fault feedback ratio, but even the best will break something else when they go in to make the fix.
This was the root of the issue that caused the error during the demo. Code that at one time was synchronized with the output of the other module became unprepared for all possible variants of that output when a change was made in module A by developer Z. This remained undiscovered for weeks because prior to Z making the change in A, developer X had broken module B so that it had no output whatsoever. So only after the now broken module A was put together with the now fixed module B did the new error show up.
SUCK IT UP, PRINCESS
The net cost of my sins is the endangering of a sale that should have been a lot easier of a closing.
If you are not completely willing to bear the worst case possible outcome of a blown demonstration or production roll out, don’t release to production before something is completely ready and tested! Given the vast cost of ‘broken arrow’ releases when you multiply by the number of users that you will take down for a while, premature production releases are simply not worth it.
FIGHT THE POWERS THAT BE
In my case, the decision was mine–no one was pressuring me for a premature release.
In many other cases, possibly yours, there is plenty of external pressure for you to release before you know you are really ready to. You must resist, do not join the dark side! For those that persist, quantify the worst case scenario in terms of user downtime and lost sales (or whatever set of effects would result) and the overeager stakeholder should relent from pressuring you…at least a little bit.
AN UGLY WORD UNLESS YOU ARE TALKING ABOUT THE SLOW TREE DWELLER
The second sin of the week was SLOTH. In today’s parlance ‘laziness’. I like to make it sound less bound-for-hellish by saying ‘lack of time’. A proper amount of time allocated to make the proper tests would have exposed the issue and ensured that I stayed on the last release for the demonstration instead of the new stuff.
If one cannot make the time to get the proper tests done, one should not use the new release. My failing was having a busy week with other matters and simply not having the time available, but in still doing the point tests and thinking that was enough. It wasn’t, of course. Complete system tests need to be done prior to releasing any code from development to production environments.
Licking my wounds,
Dean Whitford, B.Comm.
Chief Operating Officer
DraftLogic