So I did some shallow searches and here’s what I can find about DORA metrics. I’ve listed each one, how it’s measured, and how it’s inferred that it’s used, then some quick thoughts on each one.
Development Frequency
How many releases per unit time
More = better.
I’m not sure. If a release takes more time it might have good reason to, and I don’t know to what degree we could compare one release with another. The suggestion seems to be that it’s better to release value to customers more frequently, but of course if we use toggles to release partial code and functionality then we’re not actually delivering something they can use, just the change risk into the code base and another column and row in the matrix when you change the toggle combination.
I think it heavily depends on how your company releases, and what kind of product you’re making among other things.
Mean Lead Time for Changes
Time between commit and branch in production
Shorter = better.
I feel like this is more useful if you use it to show what’s holding up one team’s flow. You cannot use this to reliably compare across teams. Even for one team the idea should more more that a deviation away from normality should be investigated rather than shorter is better. The writing I found seems to sell it as the “efficiency” of the devops chain, which I think is irresponsible wording. A reasonable problem finding measurement, though.
Change Failure Rate
Number of failures in production divided by total number of deployments
This is heavily context-dependent. What do we consider a failure? The suggestion seems to be anything that needs a rollback or hotfix. Now I can see the value in detecting changes in that number, because you can see if it gets higher when you’re not expecting it to and try to identify why that could be, but one writer puts that “for elite engineering teams, no more than 15% of their deployments result in degraded services” and another writes “Most DevOps teams can achieve a failure rate between 0% and 15%”.
Each failure is different. Also each failure now has to be assigned blame, or how do we know whose change failure rate we are counting? If someone changes the database and someone else changes how they access the database and they break who is at fault? It feels like a lot of process and procedure to assign a numeric value to how crap a team is, when the actual outcome really should be about process improvement. See how each thing can have value but you don’t have to walk very far for it to become stupid? It’s a good idea to keep an eye on failures and their cost and adjust with appropriate amounts of resources, but “elite teams” keeping it under that 15% is contextually deaf and doesn’t survive contact with reality when it comes to implementation.
Mean Time to Recovery
Time to restore a system to its usual functionality
Less = better
I don’t have much of a problem with measuring this. It helps to evaluate ideas like rollback systems and backups. I think it’s also important that we’re essentially talking about downtime, so we need to examine the cost and to whom, because one outage does not cost the same as another. Also a system with a backup parallel system can swap over, and we have to count the downtime for the user and the downtime of the borked system separately. There will be plenty more concerns and limitations.
Then, turns out that “elite teams can recover in under an hour” which, again, is a waste of letters. If the team takes longer one day because someone is ill is that taken into account? Whose fault is it really if Vodaphone put a shovel through a cable? Does it actually matter that it’s under an hour, or is it actually fine? Would it cost more to reduce the number than let it sit in downtime? Maybe it goes down overnight and nobody’s using it. Probably not even a night-time ops team for that, so because it takes 12 hours they’re not elite enough?
Final Notes
The DORA stuff is littered with sales points like “With live DORA dashboards in place, engineering organizations can start to see where they stand relative to other engineering organizations, and what the scope for improvement is in their software delivery processes”, which to me reads “harder, faster, prove your love for me” which displaces the reality of development - a team of people working together and for each other toward common goals - with older ideas of factory floors or newer ideas in call centre hell boxes.
Secondly there’s a common thread through all formalised metrics that appears just the same here. Fungibility. Something is fungible if it’s interchangeable and can be pragmatically treated like any other - money is fungible because a dollar bill is worth any other dollar bill. Houses are non-fungible because they have differing sizes, number of rooms, facilities, neighbours, access, materials and so on. Mathematics relies heavily on the fungibility of numerical values, and benefits from the reliability and consistency that provides. It feels like it has integrity and certainty. I believe that formalised, cookie-cutter metrics systems use the comfort of that certainty like a security blanket or teddy bear to fight the fear of the complexities of reality. The quantity of money, and effort spent and the cost and misery inflicted in this world in an attempt to make things seem simpler than they are is arrogant and obscene.
Thirdly there’s a theme of blame in the writing about these metrics that disturbs me. For the numbers to be used in a comparative, competitive or class-sorted way, like getting a gold star in primary school, we have to assign blame. I can imagine it feeling like working for an insurance company. No, ma’am, we’re not liable for acts of God. Do you have the police report?
I believe there is a way to use measurements in a sane, humane and logical way that also shows humility to the messy nature of reality, but I’ve never seen one of these systems come with information on how each one can be misused or abused or poorly implemented or poorly interpreted, which suggests to me that the aim isn’t the betterment of a business or the people in it, much like the psychic medium, either through ignorance or deception.