Gherkin - when is a scenario too big?

I am rather new to Cucumber/Gherkin and came across a situation with selenium eCommerce tests where I could include multiple tests into one scenario to hit them all at once, instead of having X amount of different tests. Which led me to think “When is this scenario too big?”

I’ve been back and forth on it, since a large Scenario can cover a lot of ground, but can also lead to being more error prone, harder to read, and time consuming per run (especially if you only need to test one thing). But on the other hand, separating a test into multiple which only have one or two minor differences, can feel like a waste of time and energy, lead to extra maintainability, and add up to a lot of tests throughout a feature.

Is there a general rule of thumb to how many steps a test should be?

2 Likes

In cases where several scenarios are very similar to each other, you can consider using a scenario outline instead of the scenario. That way you can have a table with variables where each row can be considered a separate scenario.

Take a look at this simple calculator example, instead of having separate scenarios for each calculation option, we would just practically pass them as variables.

Feature: Calculator
  As a user
  I want to use a calculator to add numbers
  So that I don't need to add myself

  Scenario Outline: Add two numbers <num1> & <num2>
    Given I have a calculator
    When I add <num1> and <num2>
    Then the result should be <total>

  Examples: 
    | num1 | num2 | total |
    | -2 | 3 | 1 |
    | 10 | 15 | 25 |
    | 99 | -99 | 0 |
    | -1 | -10 | -11 |

Check out the Cucumber docs for a bit more details:
https://cucumber.io/docs/gherkin/reference/#scenario-outline

1 Like

Thanks for the reply Mirza! I guess in my situation it would be more along the lines of 1.) add two numbers and then 2) square that number then 3) multiply or divide, etc. as in that would test multiple options on a calculator as opposed to 3 different tests.

But on a more complex e-commerce scenario

I think what @jnunn is asking is: How many “Then” statements might we want in a scenario? Or how many “And” statements can we tolerate, nested under a “Then” statement?

(imagine a longer user journey)
Given I have a calculator
When I add and
Then the result should be
When I square that result
Then the result should be…
When I multiply that new result
Then…
And…
And…

This scenario has five assertions. Should it be 5 distinct scenarios or one?

If you were writing a BDD specification, this issue would not arise unless you combine features in one feature file. Which would be confusing.

So you are probably writing automated checks. Which is not a good idea. Using Gherkin makes no sense without BDD, it only adds extra work. That is what you will find experts like Seb Rose stating. But for all automation the general advice is to not string checks together. It has several drawbacks. And if you really need to speed things up, which is usually the reason for wanting this, there are other approaches, including avoiding the UI.

From another angle: The execution time of a check has little to do with the length of a scenario. I can write out a short check in many tiny steps and make it long but still fast. And I can write a very compact check that takes forever to execute because of all the details hidden in the steps. The simplest and most useful rule of thumb is: Write WHAT, not HOW. If you are not stringing checks together but find yourself wondering if a check should be shorter: Ask yourself WHY a certain action is there. If the answer is ‘because I need to do X’ and X is a higher level action, then using X instead will make it shorter and more understandable.

Trivial example:

  • Enter user name A
  • Enter password B
  • Click login button
    Why is each of these steps here? Because I am logging in. Then I can change these three steps into:
  • Log in as A with password B
    or:
  • Log in as a
    This applies no matter what format an automated check has, including Gherkin. It is the concept of abstraction, which is fundamental in all engineering.

But on a more complex e-commerce scenario

Good software engineering is about solving complex problems by combining simple computation units. Complexity is then distributed - the risk goes into each unit and its edges with other units.

Focus on evaluating your units and how they interact with each other, not how they act as one whole.

@martingijsen , I think you answered the core question @jnunn and I were asking.

Seems you are suggesting a heuristic is to write scenarios such that they only have one WHEN step (e.g., if several actions, abstract them). I think the time span of the scenario (i.e., automated check) is mostly packed into our GIVEN and WHEN steps. Therefore, we can be free and loose with the amount of THEN steps.

For example: If the check navigates to a page via GIVEN and WHEN, we can have one assertion on one field or ten assertions on ten fields, without significantly degrading the check.

Further Example: Let’s assume we wanted an automated check to verify a shipping address. Which approach is better?

  1. one automated check that navigates to the shipping address page and asserts something on the street address field, asserts something on the city field, and asserts something on the postal code field.

or

  1. three distinct automated checks. The first navigates to the shipping address page and asserts something on the street address. The second navigates to the shipping address page and asserts something on the city field. The third navigates to the shipping address page and asserts something on the postal code field.

#1, right? Or am I missing the point?

@ejacobson I would indeed expect one WHEN clause, preferably consisting of a single step. But abstractions are just as good for GIVEN and THEN. Abstraction is an engineer’s solution to just about anything. :smile: And apply it to your data as well as to your logic, by the way: Why spell out a whole address in a scenario if you can give that address a clear name and use that name to refer to all the details.

As for the verification of the shipping address, I would think in terms of the requirements. Only if there are separate requirements for the various parts of the address would I even consider making multiple scenarios for them. But I would really expect it to always be part of the same feature. Even if it is extended later (the country is added later when orders from abroad become supported, say), I would expect that country field to be included of the existing shipping feature, even if the new user story mentions only adding the country. This is mostly because it is really the only thing that makes much sense, but it also serves execution speed, of course.

Which is a lengthy way of saying: Yes, # 1. :smile:

1 Like

But on the other hand, separating a test into multiple which only have one or two minor differences, can feel like a waste of time and energy, lead to extra maintainability, and add up to a lot of tests throughout a feature.

This alludes to a major problem I have with Gherkin. You can’t take existing stories (e.g. log in as tom) and use them as the basis for a new story (e.g. add golf club to shopping cart) and then use those as the basis of another story (e.g. “purchase golf club” or “empty shopping cart”).

The net result is that you’re kind of forced to choose between very repetitive stories and step abstractions which get way too vague. E.g. “when a user logs in” ← which user? how did they log in? what were their preferences? These things are potentially very relevant to stakeholders, but they get buried in the step definitions.

3 Likes

For web screen testing with Selenium, you definitely need to try and restrict the Gherkin feature files to one or more CRUD scenarios, on a single screen, per file. Try to keep test data load separate from the testing.

Speaking of test data load, there are a number of approaches you can take. Bulk loading is better, especially if you are guaranteed to be starting with a fresh system and no need to worry about ID field management. It is also absolutely fine to use Selenium/WebDriver and even Gherkin for this purpose. Sometimes, acting on a screen can have far reaching consequences under the hood, making screen automation the more optimal approach to test data load.

2 Likes

To flip the question on it’s head. I would ask the question:

Why are the scenarios becoming too big?

Normally, if scenarios are exploding in size it’s because they have become too imperative. I.E. they are describing atomic steps through an application.

Gherkin works best when it’s declarative. Meaning that it should be describing high level examples of how we expect a user and a system to behave. The goal being to capture what the business wants from our software, and not to capture every specific testing scenario.

We have to be careful when we use Gherkin in testing, and especially with automation. It can bias our approaches and create real headaches for us. I’ve sadly experienced those headaches in the past, which is why I wrote this blog series:

4 Likes

I consider not being able to reuse scenarios a good thing rather than an issue.

You can reuse steps. And the better you use step definitions only as a glue layer (= only connecting the step to code elsewhere) rather than the implementation of the step itself, the easier it will be to reuse that underlying logic. But scenarios are not suitable for reuse regardless of what form they have. They are meant as acceptance criteria - examples that explain requirements. Just like programs are not meant to be reused to construct larger programs. Their components, yes, but not the programs.

Anyone writing automated checks in Gherkin rather than requirements, should just stop doing that. It is pointless. Either practice BDD so that the Gherkin format is useful or it makes no sense to use Gherkin. It seems most of the issues with Gherkin are the result of using it for something that it is not meant for, much like using a hammer to drive in a screw. Little wonder it does not work so well then, and it is not the hammer’s fault.

Perhaps the trouble with automating in general, including that with Gherkin, relates to automation being in between testing and development in terms of skills. Someone who writes feature files and also automates them needs to have big chunks of both skillsets. And few people do, in part since IT is growing so fast but also because automation is not understood too well. People who enter automation as testers often lack the engineering skills. People who have a programming background often lack the testing insight.

So I say the issue is not with Gherkin. Gherkin is just a tool, meant for a specific purpose. Let’s not blame the hammer for being unsuitable for screws, but use the right tool for the right purpose. This starts with understanding what the various tools are for.

The above is me trying to be clear, not meaning to offend. If it should offend anyone, I am sorry for not choosing the right wording.

3 Likes

But scenarios are not suitable for reuse regardless of what form they have. They are meant as acceptance criteria - examples that explain requirements. Just like programs are not meant to be reused to construct larger programs.

I don’t believe any part of this is true.

I’d say in fact, that more than half of requirements I get every day will be stories that naturally fork from existing scenarios. When I buy golf clubs, that forks off a scenario where I’ve added them to the shopping cart. The conversations I have with stakeholders also center around this.

Building executable specifications in a language that doesn’t allow this is, in my opinion, a massive impediment to doing BDD. A library of well tested, precisely defined scenarios that are easily referred to serves as a great jumping off point for conversations about new features and new scenarios. Why not use those scenarios as a basis for new scenarios?

Anyone writing automated checks in Gherkin rather than requirements, should just stop doing that. It is pointless.

I agree, but this isn’t about that.

Either practice BDD so that the Gherkin format is useful or it makes no sense to use Gherkin. It seems most of the issues with Gherkin are the result of using it for something that it is not meant for, much like using a hammer to drive in a screw.

It is the hammer’s fault though. The lack of this feature doesn’t just make test writing harder, it makes BDD harder.

Without inheritance in Gherkin you are pushed into pushing what end up being potentially important details of the specification in the imperative step code - when you “log in as user” you simply can’t squeeze every potentially relevant detail in that step without repeating yourself a lot.

This actually causes a lot of stakeholders to lose interest in even looking at these stories. They end up being too vague to be useful to them and you lose the ability to have conversations around behavior with them. This happens a lot.

The alternative - where you have a “click” step and a “enter text in textbox” scenario, is that you have a lot of very repetitive scenarios that are extremely annoying to maintain and are potentially far too detailed for stakeholders to be interested in them.

If you do have inheritance, I find this approach works fantastically well, though. The conversation centers around the forked scenario and the parent scenario is abstracted away, the stakeholder can still refer to it if necessary - the best of all BDD worlds.

The Gherkin alternative I built, for those who are interested (it’s based upon strongly typed YAML, stories are supposed to inherit, you’re encouraged to have reusable steps like “click” and “enter text” and you can generate documentation from it which you can use to trick grumpy stakeholders into doing BDD with):

Log in as James:
  given:
    browser: firefox  # test preconditions
  steps:
  - Enter text:
      username: james
      password: password
  - Click: log in
  - Should appear: dashboard

See James analytics:
  based on: log in as james  # test inheritance
  following steps:
  - Click: analytics
  - Should appear: analytics dashboard

People who have a programming background often lack the testing insight.

What specific insight do you think I’m missing here?

2 Likes

I’m not sure I see the distinction here. If the framework that sits behind the glue code supports Gherkin and/or other approaches (e.g. Junit classes or similar), then the alternative approaches on offer are merely different means to the same end.

2 Likes

@hitchdev Thanks for your elaborate response. You have clearly given this considerable thought and found it important enough to invest time in writing that response. I appreciate that. Your message made me check once more where my beliefs come from. Perhaps we can pinpoint where our insights or experiences differ by going over it in some more detail.

My (considerable) experience in automation in general and working with BDD and Cucumber in particular has not, as I recall, led to the issues you are referring to. And the sample code you provide with ‘Log in as James’ and ‘See James analytics’ concerns steps, not scenarios, so it does not clarify your point for me. So I am unsure how to respond to your statement that scenarios should be reusable / extensible / inheritable. Steps, yes, but not scenarios. Could you give us concrete scenarios to talk about?

1 Like

@darth_piriteze Not sure I get your point. Technically, it makes little or no difference whether you run automated checks using junit, Cucumber, FitNesse or Robot Framework. The computer does not care one bit.

But that, of course, is only the execution part of the executable specification. The most valuable part of BDD by far is the collaboration on the specification part, including examples as acceptance criteria. The automation is also nice, but you can do that just fine without anything relating to BDD, including Gherkin. (And you can practice BDD just fine without any automation.)

Many (probably most) of the feature files that I have seen that were written by testers or automators for the purpose of automated regression testing were bad even just as automated checks. And they had nothing to do with BDD except Gherkin (which is not even part of BDD!). But the issue was never Gherkin or Cucumber. It was poor craftmanship. Fortunately, that can be fixed!

1 Like

In an ecommerce context, I had stories like “user purchases golf clubs”. In order to complete that purchase, they first had to log in, put the golf clubs in their shopping card and check them out.

So, I had 4 stories, each one of which would follow on from another. E.g.

Log in → Put golf clubs in shopping cart → Check out golf clubs → Purchase golf clubs

Each one of these were self contained scenarios, self contained stories and tests. They contained between 2 and 10 steps each.

It simply didn’t make any sense not to piggyback one story off the other. It made sense from the perspective of writing tests (DRY) and it made sense from the perspective of BDD (“given we’ve already checked out the golf clubs under a scenario we are all familiar with → conversations about what happen next”).

Then we would get a story like “when a loyalty card member buys golf clubs, see discount” and it would be easy to just re-use the check-out process.

In a financial services context, I was maintaining a financial model for the construction of houses which was a REST API. We had a simple base case scenario which had 10 houses and 10 renters, the details of which the developers and stakeholders were familiar with.

On top of that base case we would create new scenarios which took this base case and tweaked relevant bits and pieces to specify new features. Perhaps we wanted those 10 houses to be bought instead of rented. Perhaps we wanted to create a scenario to see what happens when inflation is high.

Sometimes in that case we would also build new scenarios on top of scenarios that are built upon scenarios.

2 Likes

Many (probably most) of the feature files that I have seen that were written by testers or automators for the purpose of automated regression testing were bad even just as automated checks. And they had nothing to do with BDD except Gherkin (which is not even part of BDD!). But the issue was never Gherkin or Cucumber. It was poor craftmanship. Fortunately, that can be fixed!

Scenarios like this one?

Background: A logged in user
  Given a user "Aslak" with password "xyz"
  And I am on the login page
  And I fill in "User name" with "Aslak"
  And I fill in "Password" with "xyz"
  When I press "Log in"
  Then I should see "Welcome, Aslak"

(From Aslak, the creator of Cucumber - he used it as a canonical example of a bad cucumber story). His proposed alternative is:

Scenario: User is greeted upon login
  Given the user "Aslak" has an account
  When he logs in
  Then he should see "Welcome, Aslak"

I’ve seen a lot of scenarios like this and the creator of Cucumber right there argues that they’re bad, but I don’t agree. I think his suggested alternative is actually a worse story.

@hitchdev

Blockquote I’ve seen a lot of scenarios like this and the creator of Cucumber right there argues that they’re bad, but I don’t agree. I think his suggested alternative is actually a worse story.

Well, this practice was bad long before BDD was invented, because that is fundamental to all engineering. It applies for BDD as it does for soooo many other things, especially in IT. Aslak Hellesoy, Seb Rose, Dan North, Liz Keogh, Matt Wynne, and many other experts, many with several decades of experience, will all say the same thing here:

  1. These are not good requirements (for several reasons, explained in many books, blogs, etc.)
  2. They also increase maintenance effort to the automation

I also know this to be true from 25 years of experience. So if you have questions about how to approach certain scenarios differently to make them work better, then by all means ask. Me and others will be glad to support you. But these experts have been very successful with what they have been saying and teaching for decades and they have no interest in misleading people. So please take them seriously. You will be glad you did.

1 Like

Well, this practice was bad long before BDD was invented

Abstraction has been a good practice since long before BDD was invented. Abstraction is a very tricky thing to get right though - it’s very easy to abstract the wrong thing in the wrong place at the wrong time. That’s the crux of what this is about.

These “step by step” scenarios do increase the cost of test automation maintenance in Gherkin it is true. They do this solely because Gherkin lacks the ability to abstract, however. Were it inheritance capable, the cost of maintenance would decrease and readability would increase. That’s the power of a good abstraction. That’s why we create them.

That increase in test automation maintenance costs is, I believe, 80% of the reason why it is considered “bad practice”. The other 20% is because it is assumed that the stakeholders are never interested in this level of detail. I find that this is a mistake, though. In fact I’d even argue that this is one of the reasons why it’s so common for stakeholders to be enthusiastic about the idea of BDD and quickly lose interest in even reading Gherkin stories in practice after seeing a few automated scenarios. It conceals as much detail as it reveals. It puts the onus on the writer to decide what to conceal/reveal and puts implicit pressure (via its syntax) on the user to not reveal too much - lest the stories get unwieldy and repetitive.

I tend to find that the experts you’ve cited have varyingly written little to justify Gherkin’s language although some have written on this topic of Why You Are Using It Wrong. Aslak is the only one I’ve read who has a strong opinion on this specific topic. He wrote this article (I think) in response to a litany of people slating Gherkin on Hacker News. These included experienced and seasoned developers who had tried it and threw it away in disgust.

Liz Keogh is the one I’m most familiar with and have talked briefly to her about this. I love her and she has a wide range of experience on BDD, which I learned a bunch from (especially cynefin). Nerding out over the finer details of the language design of Gherkin and its syntax, with one small and interesting useful exception, didn’t seem to interest her much though. I suspect it’s the case for most of the other experts you’ve cited.

These are not good requirements (for several reasons, explained in many books, blogs, etc.). I also know this to be true from 25 years of experience. So please take them seriously. You will be glad you did.

Ok. Soooo… several other reasons like what?