I don’t think I have a good grasp of the problem, which may indicate that nobody has a good grasp of the problem.
You say that results are not consistently improving over time, and that might be more about how you’re evalutaing what quality is and how it improves. For example if you use simple metrics to measure improvement that don’t actually represent quality, or represent it in some overly simplistic way that gets swallowed up in the error bars of reality, then you are trying to improve something you can’t measure (incidentally this is probably true for all metrics in most contexts). It could be that improvements are not supposed to be consistent. If a product does a lot of work in one complex area that suddenly introduces change risk in a high-risk part of the product. If staff changes, or communication changes, or customer need changes, or the marketplace changes these can reflect on what “good quality” then means, and affect the results of any attempt to measure improvements. A good tester probably can test the means of measurement as well as the thing being measured, and determine where that process fails. So it may have nothing to do with the actual quality of the product, and improvements may actually be showing success. It depends on who’s perspective on success we use, and what “success” actually means, for one thing.
With that murky swamp mapped out with a large question mark the problem then, for me, is to find out who thinks there is a problem and what they think it is. Then I need to find out if it actually is a problem of any kind. The phrase “releases are generally stable” sounds like it’s going okay and “there are periodic challenges” tells me that reality sometimes happens to your project. It may be that you can tie each challenge back to a possible cause, like adding a new person to a team, or needing to learn a new technology, or a rushed design document, or any of the many little things that can affect projects in a large way, especially large ones across regions.
Each periodic challenge could be affected by a particular systemic issue across the company. Or companies. Or whatever. Such as poor management, poor communication, lack of critical skills, lack of flow, horrible deadlines, insufficient planning, premature formalisation, and so on. It could be that each one comes from its own source, and is the result of one particular team project not going well, or some process improvement that hasn’t had time to settle down (or is not working), or a myriad of other things. Without investigating each problem you can’t push back on the “why” of it and try to find a source. If there is one. Or two. Or twelve.
So it’s very hard to answer your question, but hopefully I’ve given some inspiration for how to go about breaking it down into smaller questions. I’m sure others will have other insights. I’d definitely say not to rely on metrics to track improvements, especially if those involved don’t have some understanding of metrology, statistical analysis and scientific humility, but they can be used to trigger questions that might find problems that are stopping you from ascertaining quality properly, or indeed producing higher quality… whatever you’re producing. What you look at may not tell the story someone assumes it will, but can be worthwhile if treated properly, and, unfortunately, depends heavily on what you’re trying to achieve. Numbers going up and down mostly only reliably indicate numbers going up and down.
So in short:
- Is there really a problem or is someone mistaken, overstating something, or has an agenda?
- Is there really a problem or is the measurement borked?
- Is there really a problem or is it within the bounds of the normal chaos of software projects?
- If there could be a problem can we describe it realistically and accurately, or are we just sad about the whole thing?
- If there is a problem is it actually lots of smaller problems?
- If there is a problem can we trace it to a cause, or is it too complex?
- If there is a problem can we turn it into a goal we can examine qualitatively?
Hope that is vaguely helpful in guiding the ideas. Or at least figuring out if a problem exists and what it might be to do with. If there’s any specific problems feel free to update, and best of luck