Epic software bugs - major business losses and heavy user impact

In the 1990s I used to work for one of the UK’s largest intruder alarm system manufacturers. We sold our products to distributors, who sold them to individual installers, sometimes through other intermediaries. The result was that we had pretty much no idea who our end clients were.

One of our control panel designs was 8 years old and very reliable, but a few months after a software revision we noticed an increase in returns. Investigation showed that the non-volatile memory chips were burnt out due to excessive write cycles. It turned out the new software was writing to the NVM every minute instead of once a day.

Unfortunately for us, the NVM chips were far more robust than their specification stated. They should have failed within a week or so, in which case we would have found the issue during design and testing. But even the worst ones lasted months.

What started as a trickle of returns turned into a deluge over the next year, and we knew that every product we had shipped would fail soon. And we had shipped about 30,000 before we knew there was a problem. But we didn’t know who had bought them or where they had been installed. We notified the distributors, but many had walk-in sales outlets and had no idea who they had sold to.

Every product had to be fixed by going to site and replacing the NVM and the ROM chip - there were no over-the-air updates back then. Since the NVM had contained the system configuration parameters, the alarm system then needed to be reprogrammed and tested. It cost the company a staggering amount, financially and reputationally.

We were left to consider how we could have found that bug or bugs of that type, and concluded that there was no realistic scenario in which we could have done so. And then lightning struck again, but that’s another story.

2 Likes