What all parameters to consider while coming up with a Quality Score for teams in the company?

At my company we are looking to introduce the notion of Quality Score which should give a score for each team in terms of how well they are doing from Quality point of view. Apart from the score they should also be able to navigate and look for details due to which their score is less (dashboard kind of a thing). Some parameters can be like -

  1. Unit test coverage (server side + client side)
  2. Integration test coverage (server side + client side)
  3. Active Incidents per month etc.

I am looking to understand more details like what more parameters we can consider on this and if other mature companies have already adopted something like this, that will be very helpful. Also if you guys can link any material which might be related to this topic as well!

Thank you


Sorry if I missed it, but which quality and what property are you looking to measure?
Why? And what do you intend to do with it?

Some examples of reasonable properties I’ve seen measured:

  • Monthly distinct client reports(ideally categorized) available in the Call-Center/Support - Take every few months the top one in case of a feature and try to tackle it(adapt, rewrite, add); Goal: decrease costs in the support department, increase the client support capability(as they were able to only tackle 60% of demands in certain months);
  • Product/services availability; ideally the site is up 99.99(9)% and there’s no sales revenue impacted; This meant adding dashboards of servers, resources, APIs logs with response times and errors, etc… Review regularly to increase stability and availability.
  • ISO 9001:2015 - a nice-to-have label for some brands and a good sales/confidence point;
  • User engagement through the features/screens - through something like Google Analytics/GTM, MouseFlow; this leads the business to reevaluate certain designs, flows, and information provided to streamline the process from landing to checkout.
  • A technical one: the volume of API calls which ends up costing sometimes a lot and many chained ones can increase the response time as well; coupled then with an analysis of what makes sense, and what can be redesigned technically or related to the user flow.
  • Another tech. one: what’s, when’s being logged, and how much; Some systems log synchronously. And of course, for a new product, it’s useful to have plenty of logs to work with and debug. But in turn, when there’s high traffic this can slow the entire product down, or even crash it;
  • For desktop/mobile apps: crash logs; this would reflect the app stability and the amount of users that encounter a crash; There would be a goal to reduce the crashes to under X% chance per user.
  • A softer criterion: the amount of complaints received from C-level regarding the quality of the product;
  • Some rating for the product obtained from a 3rd party institution, or other types of reviewers. Of course, you’d want to do good to have good marketing.
  • Compliance with specific legislations, and directives under the scope of the type of product sold or business;

Seems that this is not a good idea, at least it sounds not really good from the description. I don’t get why you need this score, how you are going to use it, what problem it is supposed to resolve, or how it can be beneficial… The points you mentioned might be some sort of KPI in your company, some sort of your best practices or acceptance/quality criteria, etc but I don’t see them as part of “the notion of Quality Score which should give a score for each team in terms of how well they are doing from the Quality point of view". Generally speaking, " in terms of how well they are doing from the Quality point of view” is the very subjective idea of a “Quality Score” for different teams/products, and I can only see destructive consequences for the work process, environment, and the real productivity in your company


@ipstefan and @shad0wpuppet thanks for your responses and I understand why you say that this might be a bad idea. I will explain why we are exploring- it is so that we can give a high level metric to leadership that how are teams doing on quality front, for this we can weigh in several factors like how is your unit code coverage, integration coverage, e2e coverage, test flakiness score and some other parameters which I wanted to ask if other organisations might be using.

Answering your questions @ipstefan -

but which quality and what property are you looking to measure?

we want to measure quality for the software maintained by the team, this can include the parameters I mentioned above + some other metrics like active bugs, hot-fixes required, system downtime etc.

And what do you intend to do with it?

Possibly have a high level metric which one can look to judge the Reliability, Quality, and Availability of the squads owned features. This data can help us take actions in order to improve on these fronts.

I thought this might be a common notion but I guess it is not. If this is not the way, what other methods do we use to rate team Quality practices in an organisation?

1 Like

Have you perhaps looked into the DORA metrics?

Through six years of research, the DevOps Research and Assessment (DORA) team has identified four key metrics that indicate the performance of a software development team:

  • Deployment Frequency—How often an organization successfully releases to production
  • Lead Time for Changes—The amount of time it takes a commit to get into production
  • Change Failure Rate—The percentage of deployments causing a failure in production
  • Time to Restore Service—How long it takes an organization to recover from a failure in production

Btw: For the whole delivery, not testing only.


Hi @annie2131
Out of curiosity, I’m interested in understanding how you measure test coverage. While it seems feasible at the unit test level, I imagine the challenge increases significantly with more complex tests such as component and integration tests. Could you share your approach?

Thanks for sharing and clarifying why you’d like advice on this topic, Anshit.

I’m reminded of Michael Kutz (@mkutz)'s talk: How (Not) to Measure Quality

I like the Outer/Inner/Process model and its relationship to measuring a quality score. Lots of great advice in the talk. Plus a handy Q&A at the end, particularly Michael’s answer to: “How would you counsel management to use metrics in such a way that they will not undermine the psychological safety of a team?”

Source: How (Not) to Measure Quality | Ministry of Testing


This is very helpful @simon_tomes ! Thank you so much.

Thanks @simon_tomes.

In deed I think it is very important to be aware of the perspective you take.

Unit/Integration Test Coverage IMHO is a pure developer metric (and not a good one).
The question most people think this metric answers is basically “How good is this stuff covered by tests?”. What is really answers is the question “Which/how much code is not executed by unit tests?”.
Note that executed is far from tested. A test that has no assertions is not really a test, yet it raises coverage.

To interpret this metric correctly, one needs an overview of the code base at hand. From a bird’s eye view low numbers are alarming, but high numbers shouldn’t be calming. That’s bad, because people can’t help but think that way and make bad decisions.

Active incidents per month is not a bad one. In my diagram it would be pretty much in the middle as it is influenced by development, product, and process. Development because that’s where the bugs are being added. Product because pressuring for new features can cause hasty development. Process because insufficient staffing and other strategic decisions do the same on the longer run.

If you’re interested more in these IMHO rather management relevant metrics, I would recommend to look at the DORA metics (as @jesper already said above) and to consider reading the Accelerate Book. It helped me a lot.

By the way I wrote this article on the topic, in which I go deeper in the potential fatal side-effects of measuring quality.
I think it is important to note that they exist to chose proper metrics.

Hope this helps.