Finding a healthy mix of investigative and scripted tests

Hi all,

A few days ago, I posted a thread on Twitter, about getting the right balance between Investigative Testing and Scripted Checking.

I’d love to know what you think, if you have any questions or if you have any experience to contribute.

The Thread:

Exploratory Testing helps us discover unknowns, test new things for the first time, and test existing things in new ways.

Scripts help check, detecting changes in known behaviours that have been previously observed.

Tools and automation support exploration and scripted execution.

For investigative testing, such as Exploratory, we understand what is good enough and what is a problem by referring to oracles.

We collect and build the oracles as we go, using conversations, documents, standards, models and heuristics. For example, chat with a product owner, or use Create, Read, Update, Delete (CRUD).

For scripted testing, we must come up with fixed oracles we can codify. For example user1 with password123 login successfully and is then shown the members page.

When an oracle changes or we learn of new oracles, we can use these right away for investigative testing. Exploratory Testing is great for chaining tests by learning from improve oracles as we learn.

Scripted tests can become out of date, invalid or be ineffective as our product is updated or our understanding grows.

Scripts need maintenance to take into account new or updated oracles. Sometimes scripts simply need removing, as they become invalid.

I recommend a healthy mix of investigative and scripted testing.

Too many scripts, and we will spend huge amounts of effort maintaining them. Without scripts at all, we risk repetitive investigations that cover the same ground without finding new information.

4 Likes

I agree on all in general and like specifically how pin down a big problem of scripts / automation with this.
Reasons why we (not) consider something to be a bug change comparable frequently to the time it takes to adapt a typical UI/API test automation suite.

I often handle that by going from full automation to semi-automation. Using any automation / program / code to speed up my testing and leaving demanding-to-automate tasks to be executed by me. Because I can execute them easily. :slight_smile:

4 Likes

And here I was thinking I had coined the term “semiautomated testing” :grin:.

4 Likes

:martial_arts_uniform: :boxing_glove: :boxing_glove: :martial_arts_uniform:
There can be only one! Lets dig into our posts everywhere to find out who was first! :face_with_monocle:

I’m happy to see more people using this phrase and cracking this monolithic approach to development in testing (another phrase I try to share). :star_struck:

3 Likes

If all goes well I have a TestBash Spring talk submission where I planned to mention the idea–I promise not to take all the credit for the term if it gets accepted.

3 Likes

Here’s my take: Explicit scripts should be used, as much as necessary for the context, with the understanding of the structure and limitation they impose on us and our testing.

Now an incredibly long explanation. It might go a little hard, but I respect your writing so I’m making some assumptions. Here’s a history of why my take makes sense to me.

So in the RST space Exploratory Testing and scripting have gone through a lot of investigation, discussion and exploration and they wound up unpacking Testing, documenting all the parts and putting them all back again and now we have Testing again.

At first, testing was intrinsically exploratory. It began with no particular differentiation between exploration and scripting. Actually if you back all the way up to the 60s Jerry Weinberg was already writing on the dangers of formalisation and the difficulty of encoding intent. We moved through formalisation, factory testing, human scripting, excel pass/fail sheets, and there was nothing left but scripting and confusion. Anything that wasn’t scripted was “ad hoc”.

Cem Kaner came up with the term Exploratory Testing somewhere in the late 80s I think, and as a result of some hard work and some impassioned speaking by people like James Bach we end up, around the 90s, with the beginnings of original ET - “non-scripted” testing. This idea survived probably because it found more problems and avoided the railroading scripting gives us. ET was stuff that scripting wasn’t.

“Context-driven” came next which used philosophical tools and human-centred concepts to raise ET to new ideals where the value of your testing was defined by the context in which you performed it. You can still see the Kantian and Popperian ideas thick in modern ideas of exploratory testing, and you can see its reflection in writing like Weinberg’s General Systems Thinking. Tools, terms and models came out of this part very fast indeed, like heuristics, oracles, sessions and other things most testers with a passing understanding of ET know very well. It was to re-inject humanity and adaptability back into testing and get us out of the scripting mire.

Then ET moved from ET vs scripts to a spectrum from exploratory to scripted, with ad-hoc ideas on the left side and Excel pass/fail sheets towards the right side. About here is where the idea of parallel learning, design and execution come in, where the actions a tester performs are influenced by the results of the actions they already performed. The humanism of ET became louder, putting the focus on the tester and their responsibility to make decisions about testing unfettered by scripts. Around here is where I took notice, and I defined a script in 2004 as “A set of instructions made explicit by one agent such that another (or the same) agent can later interpret them in order to replicate a process to achieve a desired outcome or behaviour.”

Then came along testing vs checking, where testing cannot be automated but checking can be. The semantic differentiation helping to clarify the human element in both, and the limitations of checking-only testing. This was a huge favour to testers, as well as a step forward in understanding. I believe this is where we moved to understand that testing cannot be automated, only tool-assisted, again putting testers in the driving seat.

Then Collins’ book Tacit and Explicit Knowledge, and writing on encoding, communication, interpretation, intent and other elements of epistemology and information theory inspired the investigation into scripting. It was like a Gallilean telescope, rendering existing ideas in crisper detail. It reflected existing thinking in testing around scripting like repeatability, check vs test, and so on.

Finally, combining all the above and more, the RST namespace retired Exploratory Testing entirely, as testing was intrinsically exploratory, and so we end at the beginning - wiser for the journey, despite it being a large circle.

So, with this thinking in mind, there is no exploratory testing, just testing which must contain exploration. Scripting is any system that controls your testing that you, yourself cannot control. Scripting can include:

  • Following someone else’s instructions
  • Following your own instructions without revising or revisiting them
  • Lack of knowledge
  • Biases
  • Subconscious mental heuristics
  • Management
  • Tools
  • Opportunity cost when choosing test techniques

All insofar as they shape how we work. They power our thinking, change the actions we’re free to take, limit or expand or control our observations.

So we exclude what we cannot change, although understand it as much as is reasonable. Ideas like inattentional blindness, cognitive biases and the like fall into this category, as well as many contextual ideas. Now we have to make reasonable decisions about what we can change - the explicit scripts we can opt into with an understanding of the both the benefit and risk in doing so. Healthy scripting relies on responsible use of agency after excluding what we cannot choose because of scripting.

Explicit scripts should be used, as much as necessary for the context, with the understanding of the structure and limitation they impose on us and our testing.

5 Likes

#Mindblown. But health, is always going to be a pejorative term. So I’m only going to ask one irrelevant question here, what is an oracle? Because this oracle thing is impacting my personal health a lot, and I’m still confused as to what a “script” is anyway. Mainly because I first heard the term oracle about 7 years ago. (And it’s just one passing fad I lifted from your history lesson there @kinofrost , sorry, but there is a lot of really good stuff in your lesson.) Is an oracle a thing you can see and most importantly, where is it when you leave the company or team?

2 Likes

Hi @conrad.connected,

I like @therockertester’s definition on this article we collaborated on.

An oracle is a way to recognize what might be a problem

The article goes on to share handy examples — and also with “heuristics” in mind.

3 Likes

An oracle is a heuristic principle or mechanism by which someone recognises a problem.

For the audience I’ll start with heuristic, for reasons that will become clear.

A heuristic is a rule of thumb. A fallible method for solving a problem or making a decision. It’ll usually work, but there are occasions on which it will let you down. So “when driving, give way to the right” is a heuristic that has usefulness, but can fail if, say, you change countries in which you are driving.

An oracle is a specific kind of heuristic that we use to recognise a problem. “A difference between things as perceived and things as desired” (Gause, Weinberg). An oracle could be “1 + 1 = 2”, a heuristic which becomes an oracle when testing a calculator. I press “1 + 1 =” and it says “10”. Oh no, a problem! Wait, it’s in binary mode.

Is it a thing you can see?

That’s an interesting epistemological question. I’m going to say no. I’m going to say oracles occur in the mind. The person I learned from most about oracles is Michael Bolton, and I’ve sent him a message and I’ll let you know what he says.

A spec is something we use to help identify problems with software, and it’s necessarily vague and incomplete, sometimes wrong. But I think the recognition of the problem relies on our interpretation of the meaning of the spec, and its comparison with our understanding of the product, within the context in which we are testing, not the visible object itself. The ink on the page or light from the monitor is not doing the heavy lifting.

Where is it when you leave the company or team?

In your mind, so it moves with you. Collecting them is important, too. As oracles are fallible the mechanism by which we understand all problems is fallible, and thus we need to use a whole bunch of different ones.

HTH!

3 Likes

A codified oracle is one you have distilled into a document, or even better code. Doing a good job of this, is how we get good assertions as part of automated checks. These stay as close to the code as possible, and don’t get lost when you leave.

1 Like

While there are many more definitions, for the purpose of my original train of thought, I simply meant a fallible source of truth.

That is, a reference to something tangible like a document, or some information captured from a conversation with someone who mattered. Maybe a model created by the team (State transition, behaviour model, UML, etc).

Why fallible? Because we use a source of truth based on what we understand today, and what we understand tomorrow maybe different.

2 Likes

Makes you wonder, about some other meanings of the word “oracle” – such as the thing that knows everything — the all-seeing/knowing oracle. Perhaps we find an alternative.

I’m reluctant to suggest another acronym, yet what about an FSOT? :sweat_smile:It’s so hard to say and it might pique the interest of people when mentioned.

“What FSOTs do you have?”
“What do you mean?”
“What fallible sources of truth do we have right now? How do they inform your testing efforts and help you identify if there’s a problem? How do you FSOT your investigative and scripted tests?”
“Hmm. :thinking:”

2 Likes

I intended to refer to a broad range of testing activities that are not focused on investigation or exploration, and are designed to uncover deviation from already known and expected behaviour. The obvious implementation of a scripted test, is a test case, either one intended to be run by a human or written in code to be machine executed.

1 Like

That is to say, I think all sources of truth are fallible, some are more open about their potential for imperfection than others.

1 Like

See: Half-life of knowledge

The half-life of knowledge or half-life of facts is the amount of time that has to elapse before half of the knowledge or facts in a particular area is superseded or shown to be untrue. These coined terms belong to the field of quantitative analysis of science known as scientometrics.

Yeah, I think @kinofrost has kind of cleared up for me the big doubt in my mind about oracles being a new idea, and I was about to launch into a thought about how industries have jargon. I think that the more we share and get good answers like you have both given when someone asks to clarify the jargon, that we all grow. For me the biggest oracle I have is product documentation (users expect that to be true) and any excel sheets with test cases described (jira is too heavyweight), and finally my coded up CI/CD tests. Requirements docs are also oracle, but my experience is that the world moves so fast, that those were only oracle on the day they got written, and quickly become history.

I thus don’t feel so guilt about hijacking the topic about balancing how much scripted and investigation testing you do. Because these are good things to unpack, words, and what we really mean, which James Bach does quite often. My own feeling is that I don’t create as many exploratory scripted tools as my team would like. We only have 2 useful ones, for different domain problems. One handles some GUI debugging problems, and the other handles complex account configurations. It was not my idea to use an autotest to pre-build complicated configurations for developers to play around in, a few places I worked at do it.

But I like how I now have a name for it, I call it “assisted testing”. It’s fairly limited but it’s very often used. It’s a “skipped” test, that lives in the test suite repo. The devs can also use it as a starting point to learn how to write system tests. I thus hope that my “assisted tests” term helps someone else talk about investigative scripting or tooling.

2 Likes

Thank you for your comprehensive history of ET and thoughts on scripts. Some of this I was familiar with, and some is new to me.

I haven’t done RST, and I’m not sure I’m ready to accept that in the wider context beyond those who have done it, that ET is ready to be retired as a namespace.

I am willing to accept language evolves, and that it does so for different groups at different rates. And sometimes it also forks, where multiple validate namespaces continue.

2 Likes

Update: Michael was kind enough to get back to me, and the way they express that now is “An oracle is a means by which we recognise a problem”, and encompasses both artefacts like a specification and our models of consistency about it, among other things.

I’m pretty positive about that because it encompasses the pragmatism of treating those kind of artefacts within the thinking of oracles, as @fullsnacktester has already written about here, and that just means more and better tools to help our thinking and miss fewer problems for people who matter without arguing about if Kant was right or not. Which is also important, but I don’t know how my microwave works and I still eat hot soup.

Yes, you can see some oracles, and they can exist wherever they might be stored.

2 Likes

I essentially agree. Honestly, the terminology is just a representation of understanding, anyway, and I think that the term is far less important than the mechanisms underneath it. The idea that testing can be automated like a robot building a car is fundamentally wrong, but when people say “automation” and I go “oh, like coded checks in a tool built to aid and assist the test effort”, or, more usually, “oh, like Selenium and all that”, the communication gets done well enough and we all get on with our day.

ET won’t die as a term, I don’t think. My guess, given your eloquence about testing ideas, is that you already think of testing as exploratory and Exploratory Testing as a pragmatic term that’s useful to you. As far as I know the RST and Context-Driven people invented and promoted the idea of exploratory testing in the first place to get us beyond that manual check suite hell we were in and warn us against the idea of carrying the same thinking into automation. I think it’s useful to have a way to hang onto the necessity of exploration, and it’s useful to differentiate between the power and constraints of coded check suites and the power and constraint of hot human-on-interface action. While RST retired it within their namespace, it doesn’t claim any authority beyond that. RST has a principle that nobody owns language or terminology, and that it’s the thinking that’s important. In fact it’s suggested you use your own terms so that you remember the ideas better. After all, if dictionaries were an authority on the use of language they wouldn’t have to keep printing new ones.

I think the drive behind retiring ET in RST is simply because if the T is always E then ET is a tautology. So if ET isn’t just at one end of an exploratory-scripted continuum you need to explain what a script is, and so they did. Explicit scripting becomes an outsider to testing - a perhaps-useful extra factor that imposes structure.

The idea that testing is intrinsically exploratory and scripts don’t bypass or replace that is really important, because it keeps the power and responsibility with the human. If I hear “Exploratory Testing” I now read that as testing that doesn’t heavily rely on large check suites. I think about smaller tools, greater freedom, more agility, tighter feedback loops and so on. Knowing that automation is manual, exploratory testing is powerful because it protects us from complacency; from pushing responsibility between the gaping chasms in abstractions in BDD descriptions, thinking of automaters as programmers instead of testers like testing skill is no longer needed, putting faith in script suite reports and red/green text, not understanding the limitations automation puts on us, failing to interact with the project context, etc. In this we can examine what automation is giving us and what it’s costing us without hoodwinking ourselves, or others, more than necessary. It also means we can push both responsibility and respect onto people that write a lot of automation code , and know that they put a huge amount of work into knowing when and when not to use automation, and what might be remaining or require different oracles that are incompatible with automation in order to find important problems.

So then we hear “Exploratory Testing”, we can think “oh, like sessions and all that” safe in the knowledge that we won’t then lie to ourselves about what that means, and without having to burn useful language.

2 Likes

I wonder, if in all this, script and exploration are the wrong key words for the underlying concept.

And instead, do we mean investigative and confirmatory?

I don’t believe it’s right to say all testing is exploratory. I think exploration implies the sense of intent, that we may get from charters or scenarios, and a focus on uncovering unknowns.

I think we can carry out investigative testing, where we don’t explore, and confirmatory testing that doesn’t use scripts.

I think I’ll need to think on this a bit longer, to come up with some good examples and challenge my assumptions.

1 Like