Purpose: Analysing your release cycle and identifying bottlenecks is an essential first step towards improvement. Once bottlenecks are identified we can then begin to explore how we can mitigate them. This exercise helps you to appreciate how you go about finding bottlenecks.
Introduction:
In the lesson, we discussed the theory of constraints and my interpretation of the theory that are used to identify and mitigate bottlenecks. Using those approaches you’re going to look at your release cycle and work out how you can improve the process.
Activity:
Think about your release cycle, perhaps model it or write notes on how the release cycle currently works.
Analyse the release cycle and ask yourself, what is the biggest bottleneck in your release cycle?
Once you have found a bottleneck, think about how you would mitigate it.
As an additional step, share your identified bottleneck with others here on this thread. You can also share how you are going to mitigate it, or ask others for suggestions.
As a prenote: we have bi-weekly sprints and use a staging environment. When I talk about “release” I mean the deployment to production. Our release cycle works simply like this. The sprint start, developers start their coding work. Once they are done with a story/bug fix they push it to the staging environment where our testers can test the features. After two weeks everything that was pushed to staging will be deployed to production, meaning we have a release.
If we have a bug that has to be fixed immediately (yes, we have a nicely sorted priority ranked bug backlog) it usually means that the code will be moved to staging, tested there, and then the code will be cherry picked by a developer to be added to the master branch. There is one exception. If we have no new stories on staging since the last release and it only contains new bug fixes, no matter their priority, we will make a small release were we deploy the whole staging branch to production, so that those fixes will be deployed without any cherry picking. But the second we have our first story on staging this small release option is blocked until the sprint is done and the new stories have been accepted by the stakeholders in the review.
And oh boy, I have not started about code reversibility. Today we had the case that a story’s requirements were flawed and we cannot release it. Of course this story was already closed and new merge requests have been made that contained the code for this story. One of our developer had to do a rollback but since other developers already deployed their new code for their tasks, the rollback seems to have no effect. I write “seems” because we do not know if the other 8 stories/bug fixes that were added later on have been negatively affected by this rollback or not. And the best thing: it happened less than 24 hours before the sprint was supposed to end.
For now, I try to focus on improving our deployment frequency to production. I do not like that we have to wait two weeks (sometimes even more) to regularly release. I would very much like to release directly every time a new story/bug fix has been closed and is simply waiting for the Review to be done and then to be deployed. And what about code reversibility? Shouldn’t we then also make sure that we improve simultaneously there as well?
I am open for any suggestions, tips, articles, etc. on how we can improve here.
I often tell people new to CI/CD that I’d rather release 20 times a day than once a week. Small changes are less risky, and when something inevitably goes wrong, it’s trivial to find the offending change.
Of course, this requires testing and monitoring and a lot of culture change.
Another insightful reply - thanks for taking the time to share.
Biggest bottleneck in our release cycle, well I hate to piggyback on the response from Alan above but from my perspective our reliance on suite-wide code deployment is a definite bottleneck. As stated - this just buckets a whole load of risk together and ships it all at once, rather than allowing for small, contained changes. I’d like to consider ways to break this down and have individual teams ship change to individual services, though there are some challenges in doing so:
Availability of support for release activities - we do not have infrastructural, customer support or testing support available all the time, so a single coordinated release event makes facilitating these releases tougher (not a blocker by any means, but an organisational change - I think of this as an external dependency to the broad team that is our development organisation).
Solution: One approach to this would be to secure organisational buy-in to trial a sprint of more frequent shipping - perhaps we identify a couple of windows in a sprint to ship and do that first, with some limited support and a clear roll-back plan. Having the courage to try would be a big step forward! This could even be something we trial in one or a couple of services, rather than suite-wide. Small steps.
A lack of understanding from our teams of how to get to continuous deployment, what good looks like in that context and a distinct lack of experience in deploying this way. Moving from an 8-week release cycle to a sprintly release cycle was really just a contraction of the existing process, where CD feels more of a fundamental mindset change in how we release software - although again keenly aware that I don’t know what I don’t know here.
Solution: get educated! Do some research and make the unknowns, knowns. Make a plan for what “ready” looks like and systematically work at getting at least 50% of it down before we start.
A lack of understanding of/for our customers of what shipping change at such a regular cadence will involve. We do have the ability to separate deployed and released code, but lack a robust process for exercising it: typically we ship new features “on”. Whilst we can change this, that is again an organisational/process change we’ll need to tackle to make CD a reality.
Solution: Work with our implementation and support teams to figure out how to deliver the value of continuous deployment without the inherent risk of “just shipping everything on”. Work on plans for recoverability as a good alternative - if we’re just shipping small changes, this should be simpler than we are used to.
Our “hardening cycle” is definitely our biggest bottleneck and the core here is that we lack the confidence to release. Typically it has been a 6 week period of extensive regression testing and bug fixing. We are experimenting with shorter 3 week periods, which is still really long in my opinion.
One mitigation would be automated tests to build confidence in our solution and that we haven’t “broken” anything. However that isn’t a magic bullet (or easily achievable) and I believe the mitigation is to constantly test and strive for quality throughout the SDLC, not “test in quality” at the end. Use of risk based testing at the end of a release cycle, targeting the areas that we perceive as the highest risk.