I’m thinking through ways to improve a quality assessment framework I’ve used in the past to help teams understand quality in context and more holistically than only through an eng lens.
I’d love to learn what y’all measure to evaluate product quality - knowing of course that it’ll be defined differently for each team. Doesn’t need to be formal assessments, just curious what signals you actually trust.
Hello Susanne,
We recently started to collect metric on services (SLI - Service Level Indicator) that drives SLO (Service Level Objectives).
We’ve asked all teams to provide their top-10 user journeys; or what are the most popular scenarios our customer are using. It wasn’t easy since many teams are often disconnected from customers. Going through our PO and some sales channel, some partners; or some team could get navigation history of the services; they eventually managed to get some data.
We now have a dashboard, public to all the company, showing team’s SLO in the past 30 days, weekly updated.
Is it working (quality improvement)? We just started… we’ll see in a couple of months.
Relevant people, like the product manager and the team, are concerned about what I found (not) out about the product. Testing needing much time to find you anything about the product can be a problem too.
Love this approach, especially the user journey mapping. We did something similar at Dropbox and used these to define just test coverage and metrics, but also as input to decisions we were making about risk.
Curious how you’re handling the ‘we’ll see in a couple months’ part - are you tracking any leading indicators or just waiting to see if the numbers improve after a set period of time?
Thank you so much - I’m really happy to hear that it resonated Your assessment is super comprehensive too. Love love love the callout about things like skills gaps and other ‘softer’ team-dynamic type things! How do you typically assess/keep track of those?
Can you help me understand what you mean about testing taking too much time to find anything about the product? I want to make sure I’m following your point.
I’ve mentioned in a number of posts that we have a quality gate before we release to production. The quality gate should be a 15 minute discussion between stakeholders to make sure we all agree that the software is “Good to Go”, driven by a brief check list.
Some of the checks are evidenced and some are just verbal checks. This is a summary of the questions the process asks:
Are dev happy with the coverage of their unit testing?
Are we happy there aren’t any critical vulnerabilities in the code? (Cyber Essentials+ check)
Do we cover all the tickets in the release in the testing?
Are we happy with coverage of the regression testing?
Are the release notes accurate and understandable by a customer?
Is the deployment process understood, including any new configuration options?
Are there any checks the customer success team need to do once in production?
Are there any customer change control processes we need to follow?
Are there any outstanding backlog bugs that we should have addressed?
Sounds like a lot but its now a routine with evidence and templates where needed it is usually pretty quick. Outcomes are Go, Conditional Go (a couple of minor actions need to be completed) and No Go.
None of the things in the checklist should be a surprise or left until we actual have the quality gate. It should be just a validation of what we already know.
We had many, many signals (bug tickets, external support ticket, internal support ticket, product roadmap achievement, teams’ velocity, users’ satisfaction, testing coverage, sales’ demo incidents, etc.) and I mainly failed to aggregate them all into one single relevant indicator.
So what I did, in the end, was to take a pragmatic approach: First choose the worst indicator (nb of bug in prod seen by clients) and work on it. Only once it will be low enough to be at an acceptable level, then I’ll move on to the next one. And so on.
After all, why work on “weak signals” when you should prioritize the worst problems first.
The advantage of this approach is to be pragmatic and easy to justifiy. The drawback is that you mostly focus on lowering a problem (instead of increasing value). It’s kind of a double negative viewpoint.
It’s a bit informal and use of judgement. I don’t have any type of spreadsheet or anything like that. Looking at what the business needs and hiring accordingly / or discussions 1 to 1 with my direct reports on continuous improvement and learning new things.
If you’re looking for objective items to track you could look into some of these.
Lead Times - How long does it take for code to be committed to deployed in production. Faster lead times generally mean you have the right quality checks in place to streamline to the deployment process. Change Failure Rate - how many bugs are generated per release from the end users perspective. Failed Deployments - How many times does your pipeline fail or need to be rolled back Error Budgets - How often are you breaking your agreed upon error budget (ex. could have an error budget of 48mins of downtime per month) Automation vs Manual testing - How much of your testing is done manually compared to automated? Client Satisfaction - surveys or feedback SLO - Have you met or exceeded your Service Level Objectives? Mean Time To Detection(MTTD)/Mean Time to Resolution(MTTR) - How long does it take for you to notice a problem in your system and how long does it take to resolve the problem? This will let you know that you have the right checks in place and can quickly respond to your system issues.
“None of the things in the checklist should be a surprise” - yes, these are exactly the kinds of practices that stick. A couple of things I’m curious about:
1: if it’s really that routine and a validation of things you already know, does the practice ever tend to get ‘stale’ (like folks are in the room just ticking the boxes), and
2: is there a split of ownership of the questions in the checklist across the stakeholders? (who is responsible for gathering the data and making the more subjective calls? )
This resonates so much. I think a lot of teams struggle with aggregating lots of lower-level quality indicators. Yes, maybe a “failure” to aggregate but attacking the biggest problem first makes the most sense, so still a win
As you worked on each indicator, how did you decide what the acceptable level was for each?
Great list! I’ve used several of these as part of different quality frameworks at Dropbox (there were multiple )
One thing that I’ve noticed and always found interesting is that teams with good ‘soft’ quality practices (like clear ownership and good communication) usually have better technical metrics too. But haven’t really seen the opposite be true.
The metrics are shown in 3 colors:
Green if > 99.9% of availability
Yellow if < 99.9% but > 99.5% of availability
Red if < 99.5% of availability
Some of these metrics are visible on a public “Incident” page on our Web site (for our main Online services).
The impact is currently very positive. For sure many developers are questioning what and how we measure availability, but overall we see good feedback. One Director was not happy when he saw it’s overall product availability below 99% which was mainly caused by the Mobile service being at 67% availability in the last Month. That trigger automatically a reaction (and actions) in bringing this number up.
So true. I think those teams with better collaboration and communication are much more in sync with each other. That also would lead to better subjective understanding of quality on a product too.
“How easy is it to test”
“Do we understand the nuances”
“Do we understand how our end-users utilize our product/service?”
Something else that could be looked at as well, and I get this is less quality and more team health is How psychologically safe is the environment… if it’s not safe the amount push back or questioning can be diminished and will stop people from bringing up concerns/risks.
I did not decide. QA team did not decide. Oh, gosh, no.
We asked the question to stakeholders, again and again, until they gave us their answer of what is acceptable or not. We provided them with the metric(s) and the past+current numbers, asking them where they want to be in 6 to 12 months.
That way, when we report bad results, people are more involved, because it is “their” indicator that is not honored. The tough part is when they ask us (QA) what we think it would take to get to their acceptable level. We have to help drive the discussions and decisions, between these stakeholders.
For example, after hearing some complaints about performances, we measured it (well, people helped us to) and then we asked Product, Tech., and Delivery people if they were happy with the current situation. If not, then what would be their targets. And also, how would they proceed to get to it. Then we followed tasks, projects, and measures about performances. And we all followed the metric(s) evolutions together.
There is a definite danger of that, but the process continuously evolves. If there is a pattern of that in a number of quality gates where there are areas no longer adding value to the quality discussion, I take a step back, work with the Customer Success, Dev and Product teams and ask “Why was this check here? Do we still need it? Should it be replaced with a more valuable check?”. But these conversations are very driven by me which leads to your 2nd question…
There is a split of ownership of questions and deliverables between Product, QA, Customer Success and Dev. In the QG I play the role if you like of an independent chair (even though I’m not that independent as I’m responsible for QA).
But I do have to chase those owners and make sure that we are in a fit state to do a QG with all stakeholders - especially the customer success teams. So there is a motivation from me to make sure engineering and product teams are confident before the QG.
Also in regards of ownership, customer success see it as “receiving information” so I could do more work to try and bring CS in earlier.
So there are definitely areas to improve around ownership so its far more up front and collaborative.
We use the same metrics, I think we got them from either the book Accelerate or from one of the state of devops report, I can’t remember exactly.
On top of these metrics, we also track the amount of tests added or replaced for each change and the duration of test execution. If it deviates from the expected duration, it is often an indicator of a hidden problem or a gap in quality harnessing.
I mean anything that treats the testing itself. Anything that makes you not interaction with the product, like slow performance, having long build or deployment times, no access to environments or tools, etc.
Better?