How do you test and plan for edge cases

I came across this article about designing for edge cases, which has obviously made me wonder how testers these days plan for testing for edge cases and how they work effectively with other team members.

https://uxdesign.cc/livin-on-the-edge-how-to-design-for-edge-cases-early-in-the-process-3fa3725d8a55

After reading this article, I conclude that his definition of edge case is pretty different than mine. More specifically, he seems to describe handling all errors as edge cases, whereas I would consider most errors as errors, and edge cases are either a sub-set of errors or a sub-set of user-flows, depending on the desired result of the edge case.

My defnition of an edge case is “the area where an action (or parameter) transitions from correct to not correct.”

In examples: I am currently specifying a product where the output is an analog signal which can be between 4mA and 20mA. This is based on an input digital signal which is an integer between 800 and 4000.

To test the input at 799 and 800, 4000 and 4001 would be edge cases. The edge here is clear. This is easy to test.

But there are other edges which are not clear. A lot of my test work is based on this. In the same system, the output should be between 4.000 mA and 20.000 mA. According to the product specification, the values 3.999mA and 20.001mA should not be allowed. So any test where those numbers are possible as output would fit my description of an edge case. The trick is in finding the values where that is possible. If I can make the output fail with less than the minimum or more than the maximum, the next question DOES correspond with one of his “quadrants” (I have a problem with quadrants too, but if the model helps him, then I won’t complain). When the output is slightly wrong, what does the system do?

It is worth noting at this point that in an analog system, such as I am describing, the answer is usually different than a digital system, such as a web site or an app. Edges are usually better defined in digital systems. But one significant exception is in timing. The edge case for “how fast” is a very grey area. If the specification states, “This system should handle 100 messages per second” then sending 100, or even 101 messages in a second are not the edge cases. The edge may be 200 messages per second or even 2000 messages per second. The fun here is in finding just where the edge is. And in a lot of cases, it is VERY important to know both where the edge is and what happens when that line is crossed. Such as my 200 messages per second system. The client had 100 devices which were on a network, so we specified that it could be up to 200 devices to give the client room to grow. The client really liked our system, so they ordered 500 devices! That is now up to 600 messages per second.

In the timing case, we were lucky enough to have a very good IoT network protocol person who created a system so good that 600 devices wouldn’t even scratch the surface of the system’s capabilities. But what if they didn’t? What if we sent out the extra 500 devices and they crashed our customer’s network? (The answer, by the way, would have been newsworthy)

So to answer a wonder with some questions:
How do we design systems or tests for known edges?
How do we design systems or tests for unknown edges?
How do we design systems or tests for combinations of edges (aka corner cases)?
How do we design systems or tests for edges which change depending on the circumstances?

2 Likes

First, I enjoyed the article. It’s a useful way to organize design work. But I agree with Brian in thinking of edge cases differently.
When you have a parameter that is ordered, it can have boundaries between classes of equivalent inputs (or outputs). The equivalence classes define areas of similar behavior. The extreme values, beyond which you cannot input (or output) are edges.
In Brian’s example, the 4.000 mA output should be an edge value, but if a smaller value is possible, 4.000 mA is a boundary. The area below is a different, possibly invalid class, with a lower edge.
If the parameters intersect, then corners appear at the intersections: 1 corner at 2 edges, 2 corners at an edge and boundary, and 4 corners at 2 boundaries.

I believe what you’re referring to as edge cases are commonly known as boundaries.

This is a fantastic article talking about Boundary Value Analysis, and it has an accompanying youtube video as well.

My understanding of edge cases doesn’t really line up with the OP’s quoted article either… I thought they were cases where the application state is such that we don’t easily see how it even got there… a user must have done something quite odd, outside normal designed workflows, to have gotten there.

3 Likes

I’m going to come back to your questions in a moment Brian, but 1st
For me, this article seems to be about testing user journeys, which is fine as a way to talk about edge cases, but can become very subjective and less quantitative very quickly. For example, how would you specify any regression test under that model? I am a numbers person, even though math is not my forte, I do agree on the idea of happy-path testing being very separate to the edge testing.

Happy-path testing is the way you get a beta product into the hands of customers, and also the way you find out if the developers (read “whole team”) really do need to cancel their weekend plans entirely. But a fully green page of happy path tests is sometimes the only way to stay sane in a continuous testing environment.

Brian, lovely example.
4ma versus 3.999mA . This is kind of digital while still being analog, and I think that anyone who does not know what a mA is also get’s the idea correctly, it’s a FAIL if it’s not 4.000. Why? Well it’s the test-jig drift problem and it’s a engineering tolerance problem. And they allow for a small error margin, and that margin will normally be specified in the small print as something like 1%, and within the total range, that give us 4mA*0.01% = 0.08 ~ or 4.08mA I think we come out at. Which is a huge tolerance, but if that’s the agreed margin for error, I test for that. Which means I either use a calibrated source of truth that sets that limit at 3.92mA, or understand what the impact of setting the single source of truth to reject anything less than 4.0, but that would not be useful if I was importing parts from China (No aspersion on Chinese manufacture will be entertained.)

Far too often , an edge case requires domain knowledge, so I guess that’s what I’m coming down to.

  1. Known edges = look at related spec documents for the domain area and derive as many possible your edge values, decide which ones you can and cannot measure early on. Note down any you don’t feel you want to measure/deem irrelevant.
  2. Unknown edges = decide if you have high confidence in the known edges as covering the entire domain space. I call this the “Johari Window” just draw one and it might help you formulate good questions about the unknowns. Remember the unknowns are really unknowable, writing the questions down will not answer them, which is where I diverge in my agreement a little from the original article.
  3. The design for corner cases = I tend to try and create camps for them, and herd them about a lot. I want to try and reduce the number of validation rules or “camps” as far possible. Some are errors, and some are low value features that are out of scope. There are probably some good storming techniques to help surface good names for these. These might even fall into the temporal future.
  4. The design for change depending on the circumstance. = Mind blown. Are we saying, What if the customer is in the Sahara desert, or at the South pole? This is where I also get unhappy with using 4 quadrants. I have used 3 quadrants to divide a problem before, but if someone came to me with a 5 'agon model and could explain it simply, it could driver the analysis.

I think it’s also important to remember, that all of this should be “throw away”, we are working agile, and so we don’t create static documents, except when they directly assist our build process as a team. (I’m also happy to be wrong on the 3.999mA thing)

1 Like

We also seem to work to a different definition of edge case. An edge case, for us, usually means something unlikely to happen in the field. I report plenty of defects in this category. I “plan” for them to be dismissed as “oh, that’s just an edge case” and may or may not argue the case for fixing sooner (often using knowledge of the clients, their usage and their priorities to decide how best to proceed).

3 Likes

Love eviltester. His insights are always insightful :wink: (Anyone who isn’t following him on his blogs and twitter is missing out)

Though I would like to point out a single line in the blog, which I’m sad to admit I haven’t read before right now.

Boundary Value Analysis is technique driven from the Heuristics:
errors happen at the edges

In my words, boundary value analysis is a technique we can use to analyze (some) edge cases. In the way I use it, the edge and boundary are similar enough to be almost indistinguishable. The edge is where things might go wonky, and the boundary is a limited area around the edge.

1 Like

You aren’t wrong about the 3.999 mA thing. It’s a bit more complex than that.
You are wrong about we being Agile, if “we” means Conrad and Brian. I am (currently) very much not agile (at the same time, our process is more Agile than in some previous “Agile” organizations, but that’s very much a discussion for another thread).

This is almost exactly the situation I’m dealing with now. Our software is attached to a piece of hardware which is not always in a climate controlled area. This means that the software has to work in these different environments. So how do we test the software for different ranges? How do we confirm the limits of the software/hardware combination?

But there’s another context which may be more interesting to more people, and that is traffic.

The amount of actions a system (server, for example) can perform when dealing with 500 users is vastly different than the amount of actions that same system can perform when dealing with 1,000,000 users. So when dealing with performance, we need to take that varying boundary (corner case?) of number of users vs. number of actions performed.

1 Like

I have never really gotten clear in my mind the boundary value testing problem. It’s always been about finding points on the curve where a behavior changes, and then testing for “off-by-one” errors around it. I always also test for off-by-two errors, probably because of my experiences on similar problems to the 4ma to 20mA problem, which are relevant to non hardware readers, because these are an analog to 3rd party libraries. When you cross an interface, like from analog to digital, it’s just like using a 3rd party library, assume nothing. In fact too often, “empty” or “none” is a rich area for these kinds of pickings when you have multiple test points/events that are so conveniently grouped around zero. So are boundaries where the system behavior appears to change when 1,000,000 people log in is interesting, because I have often found bugs in guard code the devs write, in for example a queue to handle requests. These fall into best practice patterns, all of which are well described in literature.

Edge cases are the pain ones in my ontology. An actual bug I want to raise today: A dialog has 2 input fields, Field#1 allows entry of an ip address or hostname, Field#2 allows entry of a description. My test spec says, “a user should be able to voice dictate this dialog on their mobile”. But the ip or address field does not let you, and for me that becomes a boundary case because validating an ip address while voice dictating starts becoming an edge case because it is impossible to implement that use case well. All because I don’t know the original context, which someone must have been planning for because this test case was written years before I joined.

Yes Brian, when you work in a hardware company it is hard to feel “agile”, but you often have regulation that forces you to do things the agile gospel often preaches against. For me, agile means attaching value to people over process; then evaluating and changing what you do often, in order to eliminate waste.

1 Like

I agree. I’ve always seen edge cases as a scenario that’s highly unlikely to happen (http error 418 for example, or the chance of a user trying to install against a version of SQL Server that the installer - and documentation - specifically indicate is not supported). Once identified the next step is risk assessment - do I really need to worry about 418, will it ever happen? If the risk is sufficient then it’s a case of assessing impact if a user hits the problem - will it be a quick support call (“yes, ignore it” or “reboot and it will go away”) or will it require hours/days of investigation, data repair etc. All of this then goes to help decide if we should code with this possibility in mind or face the music if the one in a million scenario happens out in the real world.

2 Likes

I am with Christina on this.

If you don’t have a quick diagnostic written down,for the customer who has the HTTP418 type error and steps they should follow to stand a 50% chance of restoring their business to operational, they will chose a competitor product if it takes them longer than a day or so to recover. That’s a risk, an outside, but something a tiny bit of documentation like “Upgrade to version X, then do this transform, then upgrade to version Y” to cover you makes it go away.