Testing and technical debt sneak into teams for all sorts of reasons, such as tight deadlines, inherited systems, and forgotten test automation that no one wants to touch. But while the causes vary, the impact is often the same: things slow down, bugs creep in, and confidence in the product dips.
Here’s a practical challenge to help you think through how debt shows up in a team and prioritise what you would do about it.
The Scenario
You’ve just joined a team working on an e-commerce web application. The codebase is several years old. After a couple of weeks, here’s what you’ve noticed:
The checkout module takes 10 seconds to load
There are frequent bugs in the order processing logic
A massive utils.py file is full of functions nobody remembers writing
There are no automated tests for the payment gateway integration
Your Task
Think through the scenario and respond to these prompts:
Spot the debt
Which parts show signs of technical or testing debt?
Is this debt created (from short-term choices) or inherited?
Assess the impact
How do these issues affect the team’s ability to work quickly, reliably, or confidently?
What are the risks of leaving them unaddressed?
Propose a plan
What would you recommend doing to reduce or manage the debt?
What would you tackle first, and why?
How would you balance this with ongoing feature delivery?
Share your approach below! Whether you’re hands-on with debt right now, or have worked through something similar in the past, your perspective is interesting and could really help others. It’d be great to see what you prioritise and how teams balance fixing the old while still building the new.
Categorizing debt signals like performance debts, functional debts, test coverage debts, code refactoring helps us to identify high risk high impact ones to tackle first …also maintaining a debt register or scrum board will help us tp track and pick them up in every sprint , allocating 10-15% of velocity to it… So overall i would say dealing with debts does not require huge big bang plans but instead following one rule of Boy Scout Rule: leave the codebase better than you found it. Refactor opportunistically. Fix what you touch…
I’d bring this information to a team retrospective, or if there is some similar meeting where I could bring it up. I’d make the problems visible as best I can and make it a problem for the team to solve.
One team I worked on, we educated the business execs about technical debt. I’m not a huge fan of velocity as a measurement, but our velocity was consistent enough that we could show it was slowing down as we accumulated debt. We were able to convince the PO and other biz stakeholders to let us take a two- or three-week “technical debt sprint” every quarter. It didn’t always happen - sometimes business needs took priority - but it happened enough times that we could do things like upgrade the tools and frameworks in our tech stack. We could also refactor automated tests since we didn’t have to deliver business value that sprint.
Make it visible, make it a team problem to solve, and educate the biz folks.
checkout module → load times being too long, tech debt, inherited
order processing logic → frequent bugs, lacking test coverage at a minimum, testing debt, likely created + inherited
utils.py file → should likely be broken down, better understood, and better ‘owned’, tech debt, inherited
payment gateway integration → lacking integration test coverage at a minimum, testing debt, likely inherited
Checkout module
Impact on team: long load times slow down testing
Risks:
poor user experience
odds are high that the load times will increase over time
increases odds of flakiness for any automated tests that interact with this module
lost $$ (long load times increase abandonment)
Order processing logic
Impact on team: devs likely afraid to touch anything related to it, lost time to bug fixing
Risks:
poor user experience, at a minimum, likely eventually $ loss and impact to brand reputation
new feature dev needs more time than it would otherwise
utils.py file
Impact on teamx: devs likely afraid + don’t want to touch anything related to it
Risks:
when something breaks or needs updating, it won’t be clear 1. who knows enough to confidently work on this, and 2. who should do the work
Payment gateway integration
Impact on team: unable to trust the integration to behave as expected
Risks:
PII/payment info is leaked
delayed indication when gateway changes/fails
high chance of serious bugs resulting in major $$ and reputation loss, PII/data leaks
You didn’t ask for this but I also want to call out another kind of debt that isn’t usually categorized under tech debt and also falls outside of testing debt: quality debt.
Imo tech debt is stuff that’s typically (not always) caused by bad eng practices and owned by Eng. Testing is typically caused by bad testing practices and owned by QA. Quality debt tends to be caused by poor cross-functional quality processes/practices and is usually owned by QA but requires E/P/D/++ to work together to resolve (could write an entire post on this but think things like better release or triage processes, or communication gaps between CX/CS and the EPD teams)
Likely quality debt (probably more but accidentally posted without this list so adding just a few for now):
lack of perf SLAs, maybe even lack of regular measurement of product qualit metrics in general
probably not an incident review process, given the frequency of bugs
unclear ownership of important code likely points to other ownership issues (who owns test code maintenance, for ex?)
Recommendation: Build an understanding of risks given this debt and the impact + likelihood + effort to address for each (just enough to give direction to start) then use that as input to prioritize Priority: Flexible of course but I’d address the highest impact + highest likelihood + lowest effort fixes for the payment gateway risks first, followed by the order processing risks, then the checkout module, and lastly the utils.py file (assuming this mostly only impacts internal users, not external ones)
How to balance: It depends, but some things I’ve seen work are:
setting aside some % of time per sprint for working on these issues
regular ‘debt pay-down’ weeks where an entire sprint is dedicated to addressing tech/test/quality debt
including cleanup work as part of oncall rotations
Thanks for this Ady. One way I like to think about tech debt is as risk of future friction. Here, friction = something that slows you down. This could be hard to understand code, a 3rd party dependency that will go out of support and so you’ll need to spend time upgrading, etc.
Once you have it framed as risk, there are two dimensions to explore:
How likely is it to happen?
If it did happen, how bad would it be?
If there’s a hard-to-read file that has been untouched for the last 5 years, it’s unlikely that it will be worked on in the next 6 months, so the likelihood of actual friction due to that file (rather than potential friction) is low.
Tackling the risk can involve reducing the chance the bad thing will happen, reducing how bad it would be if it did happen, or both. So, if there’s a file that only one person understands and it would take ages (too long for now) to tidy it up, it’s worth that one person documenting it or educating other people about it. It won’t make it any less likely that there’s friction, but it will be less should it happen.
With my limited knowledge around utils.py files lol here’s how I’d approach it
Technical Debt: The checkout code may be outdated or inefficient. Additionally, performance testing for the checkout module appears to be lacking. Devs may have also been reluctant to touch the utils.py due to time constraints or not understanding
Testing Debt: The utils.py file is large and poorly understood, with no automated tests in place. This has led to reliance on manual testing and consequently, missed bugs in the order processing logic. Timelines/deadlines could also factor in here.
Impact
Customers experience frustration due to the checkout module taking 10 seconds to load.
There is a significant business risk related to the reliability of the payment gateway, which could affect transactions and $$
Plan
Improve Checkout Performance: Start by conducting thorough performance testing to reduce load times.
Fix Bugs: Perform root cause analysis to address frequent bugs, supported by additional performance testing.
Enhance Payment Gateway Testing: Implement end-to-end testing for payment gateway integration. If time is limited, collaborate with developers to cover unit and integration tests.
Automate Payment Tests: Develop automation around the payment gateway to ensure stability and reduce manual effort.
Look into the utils.py File later: While this doesn’t directly impact customers, clean up and document the utils.py file to reduce future maintenance risks. There is risk, but its not the most immediate pressing issue.
Based upon the described scenario, the first observation around performance loading the page is defect in the application itself that needs to be raised. It may be testing debt that things like this haven’t been considered due to a lack of knowledge on testing practices. This is likely to be a complex issue but first I’d look to create a testing spike to investigate and get real numbers on performance in this area and others. If it is specific to this area then I’d use Lighthouse or other tooling to see if there’s an obvious root cause. Then I’d be looking for specifications on expected performance before raising the defect and advocating for it to be fixed.
The second observation around frequent bugs does suggest that the code has technical debt that is impacting the developers’ ability to deliver correct & functional software. I’d probably look to do a couple of meaningful RCAs with the team to understand what exactly is hurting us here and then raise tech debt tickets if applicable.
The observation around utils.py may suggest that there is technical/testing debt in documentation (depending on whether the file is in prod or test code). One thing that I would say straight away is that people will join and leave companies therefore having functions that nobody remembers writing is perfectly normal. The implication of the observation to me was that they aren’t understood and this would be tech debt. An initial avenue to explore would be looking at the change history to understand when and why these functions were written.
Finally the issue about the lack of automated tests is clear testing debt. I would propose entering a tech debt item but before addressing I’d want to understand the frequency of changes here and whether we’re concerned about changes on our end or the payment gateway. Writing automated tests isn’t free therefore there needs to be a reason and value in adding them.