Green tests missed a silent API response change. Who should catch that?

I’m a backend engineer by background, not from QA, and I’m trying to calibrate whether a problem I care about is also a real testing concern.

I recently tried a small UI-driven API regression experiment: run the same UI scenario against two backend versions, record the API traffic, and diff the JSON responses. Not formal contract testing; more like using the user flow to expose what changed on the wire.

I pointed it at a Medusa upgrade from v2.13.6 to v2.14.0. UI suite green. Integration tests green. Nothing obvious in the changelog.

The traffic diff was noisy at first (IDs, timestamps, generated tokens), but one structural change stood out: GET /admin/orders/{id}/preview started returning an email field in v2.14.0 that wasn’t present in v2.13.6.

I tracked it down to the actual Medusa source. The previewOrderChange method’s select array gained exactly one entry between the two tags:

-  select: ["id", "version", "items.detail", "summary", "total"],
+  select: ["id", "version", "items.detail", "summary", "total", "email"],

One token. Not in the release notes. Not in the migration guide. In this stack, the usual tests did not notice it because the UI did not bind that field and the integration tests asserted expected fields, not absence of extra fields.

So my question is less about the tool and more about ownership/practice:

1. How often do silent API response-shape changes actually hurt your team?
Examples: a field appears, a type changes, nullable becomes non-nullable, a default flips.

2. Who is normally expected to catch this kind of drift?
QA/test automation, backend engineers, contract-test owners, platform teams, downstream consumers, or nobody until something breaks?

I’m asking because I’m not sure whether this belongs naturally in QA practice, or whether I’m looking at it too much from a backend/integration perspective.

Nice example of mutation testing here, thank you!
Since I work in coaching I do not have a team or any numbers of bugs. But this seems to me as a typical change in code that was not recorded and tested. It would probably have been different if one of the existing fields had been removed. To me this is a problem of code reviews - developers should be required to do reviews on their changes. And those results should be handed over to the testers (automation or manual) so tests can be adapted.

Thanks. I see the connection to mutation testing, but could we also look at it as a regression-testing case?

A real upstream change introduced a small surprise in the API response, and the regression check made it visible.

The follow-up was not to fail forever, but to update the test knowledge: mark that email was added in 2.14.0, rerun, and treat it as the new expected behavior from that version onward.

That’s the part I’m interested in from a QA perspective: even if developers or upstream maintainers do not document every response-shape change, QA may still need its own record of observed behavior across versions.

In SDLC, we should try people should not catch stuff.

Because when we say catch, it means it got missed. So ideally we should try to fix the process or place from where it started.

If a new field was introduced in an API and its being used by a large number of teams/org, then ideally a new version should be introduced for API /v1(20.13..6)/ to /v2(2.14)/ so nothing should break and no one has to catch.

Changes should not be done silently. It should be informed via proper channels. So dev team can fix and test it and it should not get pass on to testers or customers to catch it.

If these changes come without information and if the field is an optional field. There are chances it might not get caught.

If there are even test related to schema, if we don’t update the schema, even automation will not catch such changes unless there is a way to cause failure.

Gaurav, fair point on the process side — versioning and proactive comms are the cleaner answer when you can enforce them. The Medusa case is one where I can’t: third-party project, optional field added under a minor bump (which is correctly non-breaking under SemVer), so there’s no process lever available to me downstream.

But re-reading my OP I think I muddied the question. What I actually wanted feedback on isn’t “how to prevent silent changes” — it’s “who finds catching this kind of drift most valuable?” QA, backend engineers worried about their own services drifting unintentionally, platform/SRE teams running
dependency upgrades, contract-test owners — different roles will value this differently, and I’m trying to figure out where the strongest pull is. When you’ve seen schema drift bite teams, who tended to feel it most acutely?

Clarification on my original question

Re-reading my OP and the thoughtful replies it’s drawn so far, I realize I phrased the question in a way that pulled the discussion toward “how do we prevent silent API changes” — which is a worthwhile angle (versioning discipline, communication, contract testing) but isn’t quite what I was after.

What I actually wanted to ask is: given that silent drift keeps happening in practice (third-party services, SemVer-permissible field additions, microservices owned by other teams), who in a software org finds catching this kind of drift most valuable?

Candidate audiences I’m trying to compare:

  • Backend engineers — worried their own service is drifting unintentionally after refactors or framework upgrades
  • QA / test engineers — looking for coverage beyond what explicit assertions cover
  • Platform / SRE teams — running dependency upgrades and wanting confidence the upgrade is behavior-preserving
  • Contract-test owners — would prefer behavioral diffs over manually-maintained schemas
  • Tech leads / engineering managers — caring about cross-team coordination cost when APIs ripple

If you’ve worked somewhere where schema drift or silent response-shape changes caused real pain — who in that team felt it most acutely? Whose work would have been most reduced if there’d been an automated catch in place?

(The prevention angle from earlier replies absolutely still applies — I just also want to pull on the “who values catching” thread, since prevention is rarely 100% in practice.)

Hey, great topic, thanks for bringing it here.

My first thought is that your phrasing seems to amount to, “who was responsible for noticing this?” Maybe I’m being biased and you didn’t mean that, but my response is based on this impression.

I would say that everyone involved had the opportunity to spot this. The question for me is more like, “is this an actual problem?” In my context, it could be, since some of our consumers are not set up to handle unexpected fields. But who’s responsibility is it to address that - should a producer be expected to know the requirements and logic of every consumer? Or should the consumer be prepared to deal with reasonable scenarios, such as a field being deliberately added so that another consumer could use it?

To get a bit more specific to your case, I would imagine that if it was an issue, an integration test with any negatively affected consumer could have picked this up. And if you did cross-team integration tests, and it still wasn’t spotted, maybe that’s because it wasn’t actually a problem for anyone.

No one can be expected to spot every issue, but we can prioritise tests based on the risk of a failure indicating a problem.

I hope that helps!

My scarry thought in this is that customers might consume the API and intentionally bail or error if a new field is added, because it’s a good way of knowing that the API has changed when they never noticed. However in practice I have found that users of your api will code around what looks like a typo to them. For example “email” in version 1 and “Email” in version 2, because it might just be that version 3 goes back to all smalls “email”. And that’s why I negative test my API’s as often as possible. It’s often enough to just assert the number of fields, if that helps. If a customer does not notice a API upgrade and the devs never documented it, it can create a support pain/cost. Or a chance for the Sales Team to sell something that already exists even!

But the original question is who is the responsible person?

At the end of the day, when it comes down to nuts and bolts, it’s the developer who forgot to write a unit test, or maybe they did, but they never wrote an integration test! Who should be checking is hard to pin down, it usually comes down to ownership of the integrations that the API impacts. API’s are not sexy, so ownership falls through the cracks.

Thanks Conrad, this is exactly the kind of ownership gap I was thinking about.

I agree that consumers often behave very differently from what API producers expect. Some ignore unknown fields, some fail on them intentionally, and some write defensive logic around things like email vs Email because they have been burned by undocumented drift before.

That is why I hesitate to call every response change a bug, but I still think making the change visible has value.

A new field may be harmless. But if nobody notices it, nobody can decide whether it needs documentation, a consumer update, or regression coverage.

Your “API ownership falls through the cracks” point feels right. These changes sit between backend, frontend, QA, integrations, platform, and support. Everyone is near the risk, but the response surface often has no clear owner.

So maybe the practical goal is not to block every API diff, but to detect response shape changes, classify them, and add tests or documentation where the risk is real.

Thanks Cassandra, that’s a helpful reframing.

You’re right that “who should catch this?” can sound like a blame question. I didn’t mean it that way, but I see how it reads.

I agree the better question is probably: “Is this response drift actually a problem, and for whom?”

In this example, the added field may be harmless. I wouldn’t call it a bug by default. What interested me was that the usual green signals didn’t show that the response surface had changed at all.

I also agree producers can’t realistically know every consumer’s assumptions, and consumers should handle reasonable changes where possible.

So maybe the useful workflow is not “block every API diff,” but:

  • make the drift visible
  • classify whether it is expected or risky
  • document it if needed
  • add regression or integration coverage where there is a real consumer risk

That’s the part I’m trying to think through.