What was your biggest staging environment issue?

Greetings on and all! It’s new article Tuesday again once again, and today we have a great article from @qamy on the impact of flakey staging environments on our work:

I have memories of painful times working with flakey staging environments that constantly fell over or had bad data in them. This got me thinking:

What staging environment bug caused the biggest production issue for your team? What did you learn?

I’d love to hear how you all handled staging issues in the past.

2 Likes

The link does not work

Sorry about that. Discourse had a moment… Link has been fixed.

I don’t know that I’d call it the biggest issue, but it was one of the nastier ones, and needs some context.

The software I work with requires multiple services and other connections to work properly. It’s first iteration used SQL Server 2000 as the database. It was on SQL Server 2008 when I started working at the company, and has since been upgraded twice - but retains a number of quirks due to its original database version.

We have test and staging environments which are kept as close to the production environment as possible, but these environments started with whatever the current version of SQL Server was when they were created. At one point, a new feature was added with the database programming using code that worked just fine in test and staging - but didn’t work at all on production.

Despite all three databases being nominally the same version, the compatibility and upgrade path of the production database meant that some newer T-SQL features didn’t work with it, where they did work with test and staging.

Cue immediate rollback of those changes and some frantic rework to fix the problem.

Is it any wonder we’re paranoid about database upgrades?

I’m not sure if it was the biggest but non-representative data has probably caused the biggest headaches overall. This is usually due to some level of legal paranoia - some legitimate, some possibly not.

The first time this happened was at the BBC about 13 years ago where they had a pretty extreme level of anxiety about us seeing production data due to some law to do with children’s data. It’s also happening at my current company due to the sensitivity of client data.

In some companies you can simply do a production database dump and restore it to staging and you’re done. This is an incredible and underrated boon to productivity when you can do it - not just for staging but also for dev environments.

In most of these other places I would have liked to have built a demo data pipeline that engaged in some degree of sanitization, deletion and anonymization and spat out a minified but still representative database dump every morning but it almost always fell by the wayside as most developer experience projects tend to.