Recently, I have started to help with implementing the test automation suite for our nightly and on-pull request builds.
We are following an agile scrum (2-week) process which includes two review processes (we work with an off-shore team - one testing and client code review). During our code review stage, we evaluate each test case using 7 steps:
Matches business requirements
Following set coding standards (using tools and documentation)
Runs in headless mode
Runs in UI mode
Fails if expected condition doesn’t match (using debug, we change the actual element before it’s validated to ensure no false-positives)
Report looks good
Doesn’t fail in Jenkins (single test)
If we’re happy, the ticket would be merged into the Develop branch. Then if needed, the whole regression suite is ran via Jenkins to ensure no related issues.
My Question
How do you review test cases to ensure they’re acceptable for the test suite?
I like the steps you have listed. Here are some things I’d like to add:
Check that it does not create test pollution. Test is self-sufficient. (The suite should be able to run in any order and still pass)
Test takes up as little runtime as possible. (Sure, computer time is cheap, but I don’t want to wait a day to see whether my pull request can be merged)
For the product coverage, I’ve used a ceremony during sprint planning where the team thinks deeply about testing for each story, producing a test plan (bullet points associated with a story). Then, when somebody opens a PR for a given set of these tests, one can try to map each bullet point with the testing code.
More details of the ceremony:
Additionally, you can look for mutants, to validate if the tests will catch (some) failures when introduced.
For the code itself, we treat it just like any other piece of code that goes into source control (we have a code review per story, defect, or other work item, and tests written as part of a work item go through code review right alongside the production code).
And that does include most of your steps (we don’t do a headless mode).
These are all good criteria for reviewing tests. I’d also add the notes Gene and Charles. The only other thing I would add is review the tests for maintainability.
The number one reason test automation suites fail is maintainability. If the automation is difficult to maintain then when pressed for time, many will fall back on manually testing things or only having time to maintain the most important tests. I’ve seen people disable tests with a note that they are going to re-enable them when they have time… they never have time.
I would recommend a few additional things to note:
Test naming convention - it is important automation tests follow a set guideline on naming.
We use - [System name] - [####] - [Type] - [Description] - [JIRA issue number]
Depending on the scope of your systems, I would also suggest you consider two sets of automation tests. Both system regression and integration E2E regression.
Our standard system regression tests always have the following rules:
They are small/run quickly.
They test basic functionality/security/accessibility/load or performance of that specific system.
They never require any interaction with other systems or components.
They can be run in any order.
In addition, they also rarely need fixing.
Our integration tests have to run in sequence and confirm data flows between multiple systems. E.g. ORDERS E2E Regression
Test 1 - Ordering website - Customer data input.
Test 2 - Staff system - Verification of order received and content.
Test 3 - Staff system - Staff can update the order and submit for shipping.
Test 4 - Shipping system - Verification receives order from staff system.
System regression suites can quickly be run for pull requests while integration E2E can be run overnight/weekly/etc. This can easily be configured in Jenkins with the system regression set linked to the system deploy and the E2E set as a stand alone.
In my experience one risk down the line is that you will have a bloated test suite that does not really cover much. So I would suggest to add to your review checklist:
Is the test needed / useful (as in if I break the business logic of the code will my test fail)
Is there another test that should be replaced by this instead
What can be improved in terms of the test data I use.
We had a tool that measured code coverage… I.e. at least one test that passed through every line of code. So I created 100% code coverage of a some classes. Then we changed some of the business logic. And my tests were still passing. I.e. 100% code coverage != 100% business logic covered.
To add another tool that can help you review / evaluate the quality of your testing I suggest having a look at mutation testing.
There’s some really good comments here that talk about coverage, which in itself is a whole topic of discussion about whether code coverage is a metric that is achievable or valuable. However, leaving that loaded statement to one side, I think another thing that is worth asking: What value am I getting from my checks.
When I ask myself this question I am thinking about things like:
Are my checks of value? What happens if I deliberately make the application fail and run my checks, what happens?
What feedback am I getting from my checks (or tests) and is it actionable? Do I have to spend hours sifting through debugging an issue to work out if there is a failure in the check or the product?
If a check is flaky, is it worth my time fixing it, what is it telling me when it fails? Is it focused on a risky area or is it something that a quick glance can cover.
It’s also worth asking yourself ‘why does this check exist’ because if you can’t answer that. It’s of no use to you and your team and you should bin it and move. We all have enough to do