đŸ€– Day 5: Identify a case study on AI in testing and share your findings

We’re now on Day 5 of our 30 Days of AI in Testing challenge! Over the past few days, we’ve built foundational knowledge about AI in testing. Today, we’ll take a look at how our discoveries play out in real-world settings by exploring case studies or sharing personal experiences.

Task Steps

Option 1: Case Study Analysis

  1. Search for a real-world example of where AI has been used to tackle testing challenges. This could be a published case study or an example shared in an article or blog post.
  2. Select and analyse a case study that seems relevant or interesting to you. Make a note of the company and context, how AI was applied in their testing process, the specific AI tools or techniques used and the impact on testing outcomes/efficiency.

Option 2: Personal Experience Sharing

  1. If you have personal experience with using AI tools or techniques in your testing activities, you can share your own journey and learnings.
  2. Describe the context, the AI tools or techniques you used, how you applied them, and the outcomes or challenges you faced.

Share your Discoveries!

  1. Whether you choose Option 1 or Option 2, share your discoveries by replying to this post. Here are some prompts to guide your post:
  • Brief background on the case study or personal experience
  • How was AI used in their/your testing?
  • What tool(s) or techniques did they/you leverage?
  • What results did they/you achieve?
  • What stood out or surprised you about this example?
  • How does it relate to your own context or AI aspirations?

Why Take Part

  • See AI in Testing in Action: By exploring real-world examples, we gain insights into what’s possible and begin envisioning how AI could transform our own testing.
  • Deepen Your Understanding: By exploring a case study or personal experiences, you’ll gain a deeper appreciation for the complexity and nuance of integrating AI into testing workflows.
  • Share the Knowledge: Sharing your case study findings or personal experiences and discussing them with others offers a chance to learn from each other’s research, expanding our collective knowledge and perspectives on AI’s role in testing.

:diving_mask: Dive deeper into these topics and more - Go Pro!

10 Likes

Hello there :raised_hands:
Today’s task is a little bit challenging, there are lots of propaganda about which paid tools are there for us to use, but the usage per se is a little difficult to find.
I gave up trying to find examples in blogs and jumped directly to videos.

These two videos shows examples of usage of parasoft tool with AI on writing, running, self-healing and analyzing test. I didn’t like the tool look and feel, but the usage is pretty impressive, on the first video (minute 20 ±) it shows the tool creating penetration tests based only on the record of the login action and at the end showing the report of all the tests that were created and ran successfully.
The other video is more complete, it shows how the self-healing and reporting using AI works in both UI and API testing. I liked this one because he used a plugin of eclipse that integrates selenium and AI, I think it is a Parasoft proprietary plugin.

Now the really impressive application, this blew my mind :exploding_head: , made me remember how many times I spent creating tests on postman by hand, now PostBot can create deep validations into the response object for you, with just a click. It seems it uses ChatGPT to generate the tests, but as we discussed yesterday, with the context of postman test language, so it is a bot specialized on writing tests for APIs.

Great innovations,
For sure, I’ll try PostBot asap :grimacing:

19 Likes

Hi my fellow testers, for todays challenge I thought rather than search for a case study I would instead share my own experiences so far with trying to use AI tools within my workplace, I hope that’s ok.

Brief background on the case study

Back in October last year my workplace devised a learning week for us all where the challenge was to spend a week learning something that helped us develop individually and also could indirectly have a possible future benefit to the software we develop. I chose to research and try to use a couple of AI testing tools and see if they could help me in the challenges I face with test automation.

How was AI used in their testing?

I chose to focus on a couple of tools that advertised self-healing functionality as it is automated test suite maintenance where I can spend a lot of time updating or fixing tests that have broken due to UI changes or API changes.

What tool(s) or techniques did they leverage?

I initially looked into Katalon Studio but it turned out that their self-healing feature is only available in web applications and it is the tests for desktop applications that I spent the most time maintaining. I then looked into Ranorex as this supposedly had a self-healing feature that worked on desktop software and I also tried out Applitools Eyes and their feature to use AI to visually test websites.

What results did they achieve?

Ranorex - I created a simple test in an older version of our software and then tried to run it in the latest version where some UI elements had changed. It failed immediately to click on a changed button, I tried adjusting every setting I could that was related to the self-healing feature and re-running the test but nothing worked.

Katalon Studio - I created a test against the latest version of our website and then ran the test against an old version. It auto-generated 2 self-healing suggestions and the preview image it generated for the healed control looked like it was identifying the correct control, after approving the changes & re-running the test and it passed through the altered locators without issue. I found it wasn’t able to fix every UI change so had to do that manually although I could see it attempt to find different locators for it. However I also wanted to see if the self-healed changes were good new locators that wouldn’t always pass even when they shouldn’t, I changed the target website where these new locators shouldn’t pass but unfortunately they still did which suggests the new locators aren’t good ones and possibly will always pass.

Applitools Eyes - I found I had the most success here. I created some baseline images and then over a few days ran the tests and they correctly compared the screenshots and failed if they saw different UI elements if targeting a different website version and passed if the elements were identical to the original baseline.

What stood out or surprised you about this example?

I think generally I was surprised that apart from Applitools all of the other tools were a heck of a lot of effort to work out how to use their supposed self-healing features and even more effort to try and determine why they weren’t working. I was disappointed that I couldn’t get Ranorex to work on our desktop software as that is where I have to invest the most effort manually, I was impressed however with Applitools and the ease in which I could get some tests running and comparing screenshots.

How does it relate to your own context or AI aspirations?

My aspirations for this week were hoping to find some very cool self-healing AI tools that would ease the burden I have in maintaining my automated test suites. Applitools I think was a big success, the others not so much, at least back in October. I continue to keep an eye on their development in the hope that they will eventually work for me as they advertise.

25 Likes

The struggle to find examples for this task also relates to AI - to ‘the dark forest theory of the web’ (Yancy Strickler). The clear web is flooded with bots, advertisers, clickbait, generic and generated junk - largely due to LLM content generation. Maggie Appleton wrote an essay and gave a talk on this subject - highly recommend. :v:


Anyway, although it’s a bit meta, I found a paper - The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study (2023) - that might serve me as a good entry-point and/or roadmap for learning about ML as part of test generation (since this is an area of application I am specifically interested in).

One of the challenges the paper highlighted relates to training data, and specifically those cases where human involvement is required:


models that make predictions based on failures—for example, test verdict oracles or models that produce input predicted to trigger a failure [39] or performance issue [88]—require training data that contains a large number of failing test cases. This implies that faults have already been discovered and, presumably, fixed before the model is trained. This introduces a paradox.

& this reminded me of reportportal, an AI-powered tool that Carlos mentioned in the AMA (from yesterday’s challenge). :thinking: It sounds like this is a challenge that they had to overcome. Is their model learning from production data?

I found a video, but due to time boxing reasons, I’ll have to find out how they approached this at a later date. :upside_down_face:

15 Likes

This is really a valuable experience report with lots of useful insights and the sort of information we’d hoped to uncover with this task. I’ve amended the task to include sharing personal stories for the folks who have that experience.

Thanks for sharing! :robot::tada:

7 Likes

Happy birthday @sarah1 :birthday:

1 Like

I started familiarizing myself with automation testing by going to Test Automation University’s courses created by Applitools. With their courses, I was able to explore how Applitools Eyes can be used to do visual testing by integrating the tool into the test scripts. By checking the tests, configuring the tolerance for changes in the UI, and manually approving and rejecting outputs, the AI learns by itself how to better mark the visual differences between pages.

4 Likes

hello hello @sarahk

In my previous company, we faced the challenge of ensuring the quality of a complex web app amidst the pressure for faster delivery cycles and heightened quality standards. To address this, we decided to integrate AI-driven testing techniques into our testing process.

We embraced tools like ReTest, Test.AI for new test cases, Applitools for visual testing.
Additionally, we employed AI-based defect prediction models to prioritize testing efforts.

The adoption of this remarkably streamlined our regression process and allowed us to allocate more focus towards exploratory testing and addressing edge case scenarios.

Integrating AI into our testing workflow came with its share of surprises and challenges, particularly during the initial phase of understanding and configuring the AI-powered tools.

11 Likes

I need to timebox my participation during a working days, so I’m going to share an article that I already shared in one of previous days:

Meta’s new LLM-based test generator is a sneak peek to the future of development (link to the paper)

Meta created a pipeline where AI tries to create a new unit test (or many), which is then built few times (to ensure stability) and coverage change is assessed. If new unit test is stable and increases code coverage, it is submitted for inclusion in the code base, at which point a human makes a decision to merge it or not.

There are some attention-grabbing details - 75% of changes did build, 57% passed, 25% were included. In larger base, it improved 10% of test classes, and majority of these improvements were accepted. In one case, test generated by AI was able to cover over 1000 lines of code.

On the other hand - this is an improvement in only 10% of all classes, and over 1/3 of code generated by AI does not even work reliably. As AI is costly to train and run, it really begs the question if this is economically worthy investment of resources. That point seems to be completely absent in the article.

Also - AI-generated test covering 1000 lines of code sounds impressive, but you really have to ask what kind of engineering practices they have to even end up in that situation.

The paper does not discuss the initial situation - how large is the code base and what is the level of code coverage? In related studies part, they discuss that applying similar technique to codebase without any tests results in mixed outcomes - one study was able to get to 80% coverage, while another study got only to 2% coverage. So it seems that LLM test generator might be useful in some narrow space, where you already have some tests, but there is still room for improvement. At the same time, this probably describes many software teams.

8 Likes

Is retest still working?

This is a case study from quite a few years ago but it stood out for me as a really interesting application of AI in Testing.

What attracted me to this and prompted me to share with you all today is that the approach solves a real testing problem “How do we test 100s of levels of a game for each release where each level may have multiple solutions”. The presentation is by a Testing Lead at King, who produces games such as CandyCrush which has thousands of levels.

The other interesting aspect of this talk is that the type of AI techniques being used is far removed from the current hype cycle of using Generative AI to solve all our testing problems.

They started using Search Trees -which many people don’t even realize is considered part of AI – and while successful found that the time taken to play levels was considerable. They evolved this into bots that were trained using a combination of reinforcement learning and genetic algorithms to create more intelligent bots that “play” the levels.

A challenge with these types of approaches is that we can rarely rely upon explicit expected/anticipated outcomes to determine if a test has “passed”. So as testers, we need to consider how we express our expectations of the outcomes to determine if there is a problem worth exploring. One of the use cases within King seems to be training different types of bots (e.g. a bot to finish levels quickly or maximize score) to explore how the application (or a level) behaves under different types of play and whether it crashes. This is a fairly clear signal that something unexpected has happened but how would you tell if a level is solvable and the level of difficulty is appropriate? Deciding on criteria for the bots for such test ideas is more complex and nuanced
but an interesting challenge for testers!

Another reason i chose this was that in my career, I’ve had many similar challenges around difficult testing challenges (although not in the games industry) and opted for more interesting approaches like this to solve the problem at scale. Often, I start by looking at how we can use search algorithms to explore some problem state efficiently but shifting towards integrating learning bots into the process is interesting in my opinion, and opens up some interesting ways to explore complex applications.

I guess a challenge for test teams is that building these types of bots is not trivial and requires a fair amount of specialist knowledge about AI/ML. At King, they had a team of data scientists working on the problem.

As an aside, last year while working with a client the question of a Data Scientist in Test role was discussed to bring data scientists into the testing teams to support their testing development of bots, synthetic data generation and inject autonomous decision making into testing processes processing (such as automatic triaging of defects or prioritization of tests) to reduce the level of toil for testers (toil being fairly routine work that detracts from more value-adding work such as exploratory testing). I for one, would find such work interesting :blush:

22 Likes

Hello @sarahk and fellow participants,

I am just done with the Day 5 tasks on studying a case study on AI in testing. Here are my findings & analysis:

The case study that I referred to was Appendix: ChatGPT Sucks at Being a Testing Expert - Satisfice, Inc.

Background Story / Pre-Context of this Case Study:

It all started with a simple LinkedIn Poll on Boundary value Testing:

c43a45cbb35bd7e6023caa86cb9e94a5724fc140b6ca3eb58e232c1a0dd4d46c

Michael Bolton (RST Instructor) posted a detailed analysis of this poll in his blog post here where he explains how an expert human tester would approach such problems: Boundaries Unbounded – DevelopSense

Jason Arbon (CEO, Checkie.ai) posted another detailed analysis of this poll with the help of AI (ChatGPT) on LinkedIn. He wanted to show how AI can also be used as a test expert by using appropriate prompting techniques. That 9000+ words article has now been removed from LinkedIn.

Finally, James Bach & Michael Bolton studied Jason’s analysis and added their commentary on it.

The findings were very interesting though as they opened up many LLM syndromes, the actual quality of LLM answers, and areas where LLM did a good job or showed promising results.

I have made a mindmap summarizing this here:

Here is a detailed video of me explaining today’s task:

Case Study Analysis - ChatGPT Sucks at Testing | Day 5 of 30 Days of AI in Testing Challenge - YouTube

Do share your thoughts and feedback on this analysis. Thanks!

Rahul Parwal

8 Likes

Hello Everyone

I have gone through the following article where there is an example of a case study where it covers how the AI is helping the organisation in testing.

Link: AI and Machine Learning Mobile Testing Tools

Summary: Discover how AI and machine learning are revolutionising mobile testing tools, improving efficiency and effectiveness in testing processes for mobile development teams.

The article discusses the evolution of mobile testing tools with the integration of AI and machine learning (ML), highlighting their potential to improve efficiency in various testing phases. It begins by emphasizing the benefits of test automation, citing statistics that show significant improvements in testing cycle speed, coverage, and bug detection when automation is implemented. Despite these gains, the article argues that there is room for further enhancement through AI and ML. :iphone::robot:

The article then delves into the present state of AI in mobile testing tools, emphasizing features such as improved element location, self-healing tests, visual validation, and scriptless or codeless test automation. These features aim to address common challenges faced by traditional testing tools, such as fragile selectors, false positives, and the complexity of visual testing. :hammer_and_wrench::mag:

A case study featuring November Five illustrates the impact of adopting modern mobile testing tools, specifically Bitrise. The company transitioned from a locally-hosted Jenkins server for continuous integration (CI) to Bitrise, resulting in improved productivity, quality, and security. The move to Bitrise facilitated easier maintenance and saved significant time previously spent on manual upkeep. :building_construction::stopwatch:

Looking towards the future, the article explores potential advancements in AI and ML for mobile testing tools, including automated and intelligent gap analysis and automated test generation. These advancements aim to further optimize test coverage and streamline the testing process, ultimately enhancing the efficiency and effectiveness of mobile development teams. :rocket::crystal_ball:

Overall, the article underscores the importance of AI and ML in advancing mobile testing tools and encourages companies to embrace these technologies to drive innovation and productivity in their development processes. :star2::woman_technologist:

Thank you

4 Likes

I don’t have any experience with AI in testing, so I was looking around on the web to find some case studies. I did not find it (yet), but I stumbled upon a blog post from Tamas User of Functionize, AI Software Testing: Unveiling the Future of Software QA (URL: AI Software Testing: The Ultimate Guide | Functionize).

In this blog he elaborated on what is AI testing and mentioned some benefits such as:

  • Enhanced Accuracy
  • Expanded Test Coverage
  • Efficient Test Creation
  • Streamlined Test Maintenance

“Overall effectiveness in software testing receives a significant boost through the incorporation of AI. The intelligent algorithms can adapt to evolving testing requirements, learning from previous test results and continuously improving testing strategies.”

The blogpost mentioned some companies next to Functionize who “are at the forefront of the AI testing trend, integrating advanced AI technologies into their systems to elevate the practice of software testing.” These companies are Katalon, Applitools and Testim

To implement software testing with AI software testing and make it successful, make sure to there is a clear road map and objectives to implement it. Identify any gaps in AI knowledge with you team. Select the set of test cases where AI can offer the most significant improvements, such as complex data analysis, pattern recognition or repetitive tasks.

I watched the demo video of their product (URL: Intelligent Testing Demo). It looks nice, but it always assumes that the test environment is working. I wonder how the tools is working when some interfaces are not working. It would be nice if the AI can then adapt to the situation and mock data in stead of using real data. For me that would be a nice attribution to the AI model.

7 Likes

As myself is currently checking out something about LLM test case or test data generation, and while Claude 3 just dropped today, I went to look for some examples of using Claude.ai in test data creation.

Here is one example:

In the article the writer uses Claude to generate DB test data from one given CSV file to another and more, for a purpose of database performance testing.

To further enhance the experience, he does not stop at finding the successful prompt, but also uses Claude to write a script to help him quickly repeat the process.

It is a good reminder that testing methodology is not a single task but a process with different kinds of work, i.e. different possibilities for AI tools.

Besides, if we really want to use prompt engineering in daily for test case generation, I guess it is important to learn the techniques properly and equip on it. Otherwise it will be very time-consuming to test out the prompts blindly.

Anthropic also have a good doc for it and could be some general guide for everyone: Prompt engineering

9 Likes

I can concur with a number of others in this group that “real-life” use cases are few and far between. For me, I looked at PostBot and Postman. Since this has only gone live in the last few months, I’m not surprised that there are few “white papers” on this. However, the promise is really exciting. Having automated API calls and particularly flows in Postman in the past, if the tool can actually do what it claims, that could have saved a considerable amount of effort.

4 Likes

Last summer, I participated in James Lyndsay’s excellent (and free, he is the nicest person) workshop series to learn about AI/LLMs. In one session, we tried out Cody, a coding AI assistant that you can use as a plugin with VSCode. It was pretty good at explaining code, though depending on the prompts we gave, it would sometimes just start making things up - referring to files that didn’t even exist in the code base.

The developer I work with who builds and maintains our Agile Testing Fellowship website uses Tabnines, a similar AI assistant for IDEs. He finds it helpful for speeding up his coding and testing. I tried out Tabnines and Cody side-by-side to explain code in our code base. They pretty much came up with the same results. As a tester, I want to understand the code. I’m in a Code Reading Club, learning how to read code. Being able to use AI assistants and ChatGPT to explain code is also a big help.

13 Likes

I am more involved in mobile app testing and have been taught to identify case studies in AI app testing. Here is my finding: Testing AI Applications: Best Practices and Case Study

Case Study: Interactive App with a 3D CG Character. It is a well-written case study and worth reading.

4 Likes

Oh, I just heard about Claude today, this looks interesting!

4 Likes

Hi @sarahk & folks :wave:

Definitely, today’s task is challenging, but opportunity exists in challenges.
I was able to find a very interesting session held by the Lambda Test.

Please go and checkout → AI Empowered Software Testing

AI testing can be incorporated throughout the various testing stages, along with test management tools for test case management. AI testing can enhance software testing processes and make them more efficient. However, challenges for AI are lack of trained resources, security and many-more.
Also, how AI is going to in-corporate with the ML based Apps/Systems :thinking: .

2 Likes