What’s one Quality Engineering activity you carried out recently, and what influenced the way you approached it?

Quality Engineering work changes a lot with context. What you do depends on the product, the team and the situation you are in. I would like to hear how this shows up in your work. Here is an activity to help you break down one example and explain how context shaped it.

Step 1: Choose an example activity

Pick one activity you carried out as a QE. It might be improving systems and automation, responding to questions, exploring uncertainty, handling interruptions, moving between domains, shaping quality across a team or helping others understand expected system behaviour.

Step 2: Describe the activity in your own words

Write a short description of what you did and why.
Think about the problem you were addressing, the steps you took and the purpose of the activity.

Step 3: Explain how context shaped the activity

Identify at least two contextual factors that influenced your choices.
These might include team goals, technical constraints, the stage of development, the product domain, team readiness or skills or organisational expectations.
Explain how each factor shaped what you did.

Step 4: Reply to this post

Share your answers by replying below. Your example will help others see how context shapes the work of a QE.


Example answer

Step 1: Choose an activity
Investigating flaky tests and deciding whether they should target a different layer of the system.

Step 2: Describe the activity in your own words
A QE noticed a cluster of intermittent end to end tests were failing without a clear root cause. They reproduced the failures locally, captured logs and timings and traced the failures to a race condition in a shared cache and a brittle test setup.

They proposed moving the fragile checks into focused integration tests with controlled mocks and added a lightweight health check to the pipeline to reduce false negatives. The purpose was to reduce wasted investigation time, make feedback reliable and keep the team confident in their testing.

Additionally, the QE raised the topic of the shared cache with the team, and started a discussion about possible risks that might not be limited to the testing scenario. The QE asked if the shared cache had caused any previous issues in production, or had caused problems during any other testing activities.

Step 3: Explain how context shaped the activity
Team readiness and skills. The team had limited experience with mocking and integration test design so the QE paired with a developer to teach the pattern and help the team learn and own the change.

Technical constraints. The system used a shared cache and flaky test data which made end to end runs brittle, so the QE isolated the component in integration tests where state could be controlled.

Organisational expectations. The organisation expected fast feedback on pull requests, so the QE kept the PR pipeline quick by moving slow, flaky checks out and adding a fast health check.

Systems understanding. The architecture of the system, the shared cache, contributed towards testability issues and may have had other impact. By sharing knowledge and understanding of the system gained while investigating the fleky tests, the QE started a meaningful discussion that supports the team to make future decisions.

1 Like

@fullsnacktester,

In the recent past, a QE activity of mine was to trace a bunch of UI tests that were failing at random. The failures were not stable in fact and instead of going directly to repair the tests, I spent some time on replicating the problem, reviewing the logs, and looking into the successful runs as well as comparing them with the failed runs.

Finally, I was able to find that the failures only occurred when there were a number of tests applying functions to the same API at the same time. This made me suspect a data collision issue rather than a wrong assertion. A developer and I worked together to confirm that it was indeed a problem with the shared test environment the API was not deleting the data between the runs properly.

Two important factors influenced the way I handled the situation:

  1. Team skills & readiness
    The group had a couple of testers who were just beginning to learn how to debug automation. I took my time with the investigation and let them in on the whole process instead of just fixing the test. This served as a good means to build a common understanding of the correct way to identify flaky tests.

  2. Technical constraints
    We do not have different test environments for every run, so I was left with no choice but to come up with a solution that did not rely on the acquisition of new infrastructure. The end product was the addition of cleanup steps and the running of certain tests one after the other to make sure there were no collisions.

The issue with the unreliable tests was resolved but what was even more significant was that the team now realized the reasons behind the flakiness of the tests and how the dynamics of our situation shared environments and skilling levels influenced the solution.

Thanks,

Ramanan