🤖 Day 14: Generate AI test code and share your experience

Nearly at the halfway mark! For Day 14, we want to focus on how AI is being used to build automation. In recent times, there has been a growth in automation tools using AI to simplify the creation or improvement of test code or to (nearly) eliminate the need for knowledge of coding all together (so called Low-Code or No-Code tools). They represent a potentially different way of building automation that could be faster and more robust.

For today’s task, let’s focus on building test code for functional testing… we have other challenges coming up that focus on AI’s impact on other types of testing and topics such as self-healing tests.

Today’s Task

  • Select a tool: Early in the challenge, we created lists of tools and their features, so review those posts and find a tool that interests you. Here are some tips:
    • If you are not comfortable with building automation, pick a No-Code or Low-Code tool and try creating automation with it. Some examples might be:
    • If are experienced with building automation, why not try using a code assistant such as CoPilot or Cody AI to assist you in writing some automation.
    • If you have already evaluated a functional automation tool earlier in the challenge, why not pick a different tool and compare the two?
  • Create some test code: Set a timebox (such as 20-30 mins) and try to build a small example of automation using your tool of choice:
  • Share your thoughts: Reply to this post and share your findings and insights such as:
    • What level of experience you have with functional automation.
    • Which tool you used and the automation you were trying to create.
    • How you found working with the tool to build and update your automation.
    • Did the code work the first time, or did you need further refinement?
    • Did you find any limitations or frustrations with the tool?

Why Take Part

  • Better understand the direction of AI for automation: The use of AI in functional automation is expanding, and taking part in this task allows you to gain exposure to these new ways of building automation and their limitations. Sharing your experiences with the community makes us all smarter.

:diving_mask: Dive deeper into these topics and more - Go Pro!

5 Likes

Hey there :open_hands:

I tested PostBot on postman, I got one endpoint example to see the generated tests it could get me.

  • First the GET https://type.fit/api/quotes with the following response:
[
  {
    "text": "Genius is one percent inspiration and ninety-nine percent perspiration.",
    "author": "Thomas Edison, type.fit"
  },
  {
    "text": "You can observe a lot just by watching.",
    "author": "Yogi Berra, type.fit"
  },
  {
    "text": "A house divided against itself cannot stand.",
    "author": "Abraham Lincoln, type.fit"
  },
  {
    "text": "Difficulties increase the nearer we get to the goal.",
    "author": "Johann Wolfgang von Goethe, type.fit"
  },
  {
    "text": "Fate is in your hands and no one elses",
    "author": "Byron Pulsifer, type.fit"
  },
  {
    "text": "Be the chief but never the lord.",
    "author": "Lao Tzu, type.fit"
  },
  {
    "text": "Nothing happens unless first we dream.",
    "author": "Carl Sandburg, type.fit"
  },
  {
    "text": "Well begun is half done.",
    "author": "Aristotle, type.fit"
  },
  {
    "text": "Life is a learning experience, only if you learn.",
    "author": "Yogi Berra"
  },
  {
    "text": "Self-complacency is fatal to progress.",
    "author": "Margaret Sangster, type.fit"
  },
  {
    "text": "Peace comes from within. Do not seek it without.",
    "author": "Buddha, type.fit"
  },
  {
    "text": "What you give is what you get.",
    "author": "Byron Pulsifer, type.fit"
  },
  {
    "text": "We can only learn to love by loving.",
    "author": "Iris Murdoch, type.fit"
  },
  {
    "text": "Life is change. Growth is optional. Choose wisely.",
    "author": "Karen Clark, type.fit"
  },
  {
    "text": "You'll see it when you believe it.",
    "author": "Wayne Dyer, type.fit"
  },
  {
    "text": "Today is the tomorrow we worried about yesterday.",
    "author":  "type.fit"
  }
]
  • First I asked to generate the suggested tests, so I got these:
pm.test("Schema is valid", function () {
  var schema = {
    type: "array",
    items: {
      type: "object",
      properties: {
        text: { type: "string" },
        author: { type: "string" }
      },
      required: ["text", "author"]
    }
  };

  var response = pm.response.json();
  
    pm.expect(tv4.validate(response, schema)).to.be.true;
});

// Test for response time within acceptable range
pm.test("Response time is less than 500ms", function () {
    pm.expect(pm.response.responseTime).to.be.below(500);
});

// Verify the length of the response data
pm.test("The response data length must be greater than zero", function () {
    const responseData = pm.response.json();
    pm.expect(responseData.length).to.be.greaterThan(0, "Response data should not be empty");

    responseData.forEach((quote) => {
        pm.expect(quote.text).to.be.a('string');
        pm.expect(quote.author).to.be.a('string');
    });
});

I had to fix some of the tests because the validations didn’t work for me, but the changes were very basic.

Then I wanted to verify if at least one item of the result array had an ‘author’ value that contained the ‘type.fit’ string. But I got an error message, I tried multiple ways to ask for the same test and got an error, so I asked for:

test if all author values contains type.fit string

Then I got:

// Test to check if all author values contain "type.fit" string
pm.test("All author values contain 'type.fit' string", function () {
    const responseData = pm.response.json();
    responseData.forEach((quote) => {
        pm.expect(quote.author).to.include('type.fit');
    });
});

I liked the tool very much, it saves me time specially for the basic testing.

8 Likes

Hello guys,

I have used GitHub Copilot for this challenge. It definitely deserves some praise. I tried to automate some stuff like login, logging some network response headers, drag and drop tests… via Cypress.

My experience was pretty much good. It has generated Cypress code in effective way along with the comments. The best part is it is explaining us back what it generated line by line. If the instructions are clear, the output is well.

For some things it worked for the first time, but for some intricate actions like drag and drop, I have refined my input multiple times and at fourth/fifth instance it gave the workable code.

For some parts like how to handle the code when a particular UI element is not present it struggled a bit. Also there is no ‘Edit’ button to refine the input we have to go for the next line. Apart from these, for the people who are having Basic/Intermediate knowledge in Cypress, this would serve as a helpful assistant rather than a replacement. Copilot truly shines :star2: when combined with human expertise.

9 Likes

Hi my fellow testers, for today’s challenge I decided to use Postman’s AI tool for this challenge as I have access to it and have used it previously.

Create some test code

This was the tests I had wrote manually against one of our endpoints (note: I have renamed some tests to protect confidentially)

pm.test("Status test", function () {    pm.response.to.have.status(200);});pm.test("response must be valid and have a body", function () {     pm.response.to.be.ok;     pm.response.to.be.withBody;     pm.response.to.be.json;});pm.test("Response value matches expected values", function () {    pm.expect(pm.response.text()).to.include("Test Name.Test");    pm.expect(pm.response.text()).to.include("Loc char.Test");    pm.expect(pm.response.text()).to.include("UWI.Test");});

I then asked the AI to write tests against the same endpoint with no further instructions and it came up with:

pm.test("Response status code is 200", function () {    pm.expect(pm.response.code).to.equal(200);});pm.test("Content-Type is application/json", function () {    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/json");});pm.test("Available test IDs array is present and contains at least one element in each property object", function () {    const responseData = pm.response.json();        pm.expect(responseData).to.be.an('array');    pm.expect(responseData).to.satisfy((data) => {        let isValid = true;        data.forEach((property) => {            if (!property.availabletestds || property.availabletestIds.length < 1) {                isValid = false;            }        });pm.test("Id field is a non-empty string", function () {  const responseData = pm.response.json();    pm.expect(responseData).to.be.an('array');  responseData.forEach(function(property) {    pm.expect(property.id).to.be.a('string').and.to.have.lengthOf.at.least(1, "Id should be a non-empty string");  });});

Out of the box these tests resulted in a syntax error, so not a great start. I told it to fix it which it managed to but one of its test assertions then kept failing

pm.test("Available test IDs array is present and contains at least one element in each property object", function () {
const responseData = pm.response.json();
pm.expect(responseData).to.be.an('array');
pm.expect(responseData).to.satisfy((data) => {
let isValid = true;
data.forEach((property) => {
if (!property.availabletestIds || property.availabletestIds.length < 1) {
isValid = false;
}
});
});
});
});

So I again asked it to fix it, this time it managed to fix it and end up with a test that passes:

pm.test("Available test IDs array is present and contains at least one element in each property object", function () {
const responseData = pm.response.json();
pm.expect(responseData).to.be.an('array');
responseData.forEach((property) => {
pm.expect(property.availabletestIds).to.be.an('array').and.to.have.lengthOf.at.least(1, "Available test IDs should contain at least one element");
});

I then asked it to add a test that asserts that the following values are returned: Test Name.Test, Loc char.Test & UWI.Test and it returned a test that wouldn’t pass:

pm.test("Properties Test Name.Test, Loc char.Test, and UWI.Test are returned", function () {    const responseData = pm.response.json();    pm.expect(responseData).to.be.an('array');    responseData.forEach((property) => {        pm.expect(property.extendedName).to.include("Test Name.Test");        pm.expect(property.displayName).to.include("Loc char.Test");        pm.expect(property.defaultTestId).to.include("UWI.Test");    });

so I ended up having to use part of my test assertions that I created manually.

To conclude, I feel there is some benefit that the AI tool can bring in creating the basic test assertions for you but once you need anything a bit more complex it will either require you to manually edit its code or just write something yourself.

6 Likes

Hi,
Not being a programming lover at all, troubleshooting, requirements and customer interactions, getting stuff working is what rocks my boat, I was still able to automate one of our simpler web applications with Cypress totally from scratch just using chatgpt. That was pretty impressive.

3 Likes

Excellent - not tried PostBot yet but it’s on my To Do list

3 Likes

That’s interesting - did GPT explain the code it wrote? If so was it something you understood enough to have confidence in the automation?

I’m sure @daniel.dainton would be interested in your feedback on the Postman AI

1 Like

Really like your last sentiment - I personally think AI will enhance testers not replace them. It may remove the need for testers to perform some of the more routine testing activities but will hopefully amplify testers on the more challenging tasks.

I’m also quite excited by testing specific AI tools - ChatGPT does a reasonable job but i think having tooling trained (or fine-tuned) specifically in testing domains will be better

3 Likes

Day 14

Post Bot

Just used a simple get address endpoint that I had been using, which already had a couple of tests. Response codes and array of results.

Add more tests

pm.test("Response Content-Type is application/json", function () {
    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/json");
});


pm.test("Id in the results is a non-empty string", function () {
  const responseData = pm.response.json();
  pm.expect(responseData.results).to.be.an('array');
  
  responseData.results.forEach(function(result) {
    pm.expect(result.id).to.be.a('string').and.to.have.lengthOf.at.least(1, "Id should not be empty");
  });
});


pm.test("Point object contains valid latitude and longitude values", function () {
  const responseData = pm.response.json();
  
  pm.expect(responseData.results[0].point).to.be.an('object');
  pm.expect(responseData.results[0].point.latitude).to.be.a('number');
  pm.expect(responseData.results[0].point.longitude).to.be.a('number');
});


pm.test("LocationMeta object contains expected fields", function () {
    const responseData = pm.response.json();
    
    pm.expect(responseData.results[0].locationMeta).to.be.an('object');
    pm.expect(responseData.results[0].locationMeta).to.include.all.keys('w3w', 'address', 'street_formatted_address');
});

I like that PostBot built on the existing tests, although it repeated checking if an array of results was returned. When I ran ‘Fix tests’ it removed this repetition.

It generated contract style tests rather than values, which are more repeatable, but perhaps less specific depending on what I was testing for. I could have been more specific in the prompt, but I wanted to see what PostBot returned.

Save a field from response

var placeName = pm.response.json().results[0].placeName;
pm.globals.set("placeName", placeName);

I’m guessing it was the first field after the id, so disregarded the id. I guess this is reasonable, but it wasn’t transparent that this is what had occurred.

Fix tests

The first time I ran Fix tests, it tidied up the formatting and removed repetition. It left me wondering about the prompts behind Add more tests and fix tests. It also added more keys to the LocationMeta test. Not sure why it didn’t add them in the first place.

I then changed this test in a couple of ways:

pm.test("Response Content-Type is application/json", function () {
    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/octet-stream");
});

pm.test("Response Content-Type is application/octet-stream", function () {
    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/json");
});

How did it know which was right? Perhaps the response was wrong, or the test name or the assert? It corrected each instance of the test to use application/json. Again, lack of transparency, as to what oracle was used for the truth. I suspect the response.

Generate Documentation

This endpoint makes an HTTP GET request to retrieve location information based on the provided query parameters. The response will contain details about the location, including its ID, name, formatted address, latitude, longitude, and additional location metadata.
The response will have a status code of 200, and the content type will be in JSON format.
The response body will include an array of "results", where each result object will contain the location's ID, place name, formatted address, latitude, longitude, and location metadata, including what3words address and structured address details.

A reasonable description, can’t see anything too obvious missed.

PostBot didn’t show what it was doing or why, so you could end up with a bunch of tests that you possibly don’t understand. However, with better prompts rather than using the buttons I think it could provide better outcomes. I would need more time to investigate.

5 Likes

Nice write up Ash - Seeing a lot of interest int PostBot so will definitely have to have a look at it

1 Like

I don’t have the energy for this one, I’m too out of practice for writing automation code. I haven’t done that without a pair or ensemble in too many years.

I did try out generating tests against our ATF code with both Tabnines and Cody extensions in VSCode. I quickly ran out of Cody queries (I have the free version) but it seems to do a decent job of writing basic rspec tests. Tabnines is less imaginitive in the test cases it generates, but if I prompt it to do write additional tests, such as, “include format exploits as inputs”, it is able to do that.

I just found out the developer I work with has switched to the Github Copilot extension, but there isn’t a free version and I can’t justify the cost. It’s interesting to play with these things, but generally, I’m quite poor at working on my own. Will read through others’ reports here.

3 Likes

Hey Lisa, there is a one-time 30-day trial for Copilot. Today I have used the same for this challenge.

Hello @billmatthews and fellow participants,

For today’s task, I collaborated with @joyz and we both connected for ~2 Hours to do this exercise.

We wanted to make something meaningful out of our learnings from this course and thus we decided to build a Prompt Engineering Checklist.

Here is a braindump of our ideas on some checkpoints for writing good prompts:

We used Bing CoPilot to generate our test code. The overall process went smoothly and we were able to code our checklist completely using CoPilot.

It’s LIVE and available here: Prompting Checklist for Testers - Rahul’s Testing Titbits

Do share your feedback!

I also did a video demonstrating how we did this checklist development from start to end.

Here is the full video: Generating Code with AI | Build Your First Project using AI - Prompt Engineering Checklist | Day 14 - YouTube

Looking forward to your thoughts and feedback on this one.

Thanks,
Rahul

6 Likes

Kudos for collaborating on the task! @parwalrahul and @joyz and for the awesome Prompting Checklist for Tester!

2 Likes

I asked ChatGPT:

Using Robot Framework, write test automation code:

  • connect to jumpbox IP 10.4.24.24 login john,
  • ssh to Linux server switchqa user qa
  • go to directory /home/qa/injector
  • copy file a.msg to a1.msg
  • send message a1.msg using command: inject 10 1 a1
  • assert that the output contains the string “Best regards”

It returned:

** Settings **
Library SSHLibrary

** Variables **
${JUMPBOX_HOST} 10.4.24.24
${JUMPBOX_USER} john
${JUMPBOX_PASSWORD} your_jumpbox_password
${LINUX_SERVER_HOST} ip_of_linux_server
${LINUX_SERVER_USER} switchqa
${LINUX_SERVER_PASSWORD} your_linux_server_password
${INJECTOR_DIRECTORY} /home/qa/injector
${ORIGINAL_FILE} a.msg
${COPIED_FILE} a1.msg
${COMMAND} inject 10 1 ${COPIED_FILE}
${EXPECTED_OUTPUT} Best regards

** Test Cases **
Send Message Test
Connect to Jumpbox
SSH to Linux Server
Go to Injector Directory
Copy File
Send Message
Assert Output Contains String

** Keywords **
Connect to Jumpbox
Open Connection ${JUMPBOX_HOST}
Login ${JUMPBOX_USER} ${JUMPBOX_PASSWORD}

SSH to Linux Server
Login ${LINUX_SERVER_USER} ${LINUX_SERVER_PASSWORD}

Go to Injector Directory
Execute Command cd ${INJECTOR_DIRECTORY}

Copy File
Execute Command cp ${ORIGINAL_FILE} ${COPIED_FILE}

Send Message
Execute Command ${COMMAND}

Assert Output Contains String
${output}= Execute Command cat ${COPIED_FILE}
Should Contain ${output} ${EXPECTED_OUTPUT}

** Test Teardown **

That is pretty close to what I would have written in RIDE (I may have used fewer variables).

2 Likes

Day 14 Task Summary

The task for Day 14 centered on utilizing AI to build automation, particularly in creating or enhancing test code. Here’s a breakdown of the steps taken:

  1. Select a Tool: I opted to use Microsoft Copilot.
  2. Create Test Code: The aim was to construct a To-Do list iOS app from scratch and write UI tests for it. Copilot generated the following SwiftUI code for the app:

swiftCopy code

// SwiftUI code generated by Copilot
import SwiftUI

struct ContentView: View {
    @State private var newTodo: String = ""
    @State private var allTodos: [String] = UserDefaults.standard.stringArray(forKey: "todos") ?? []

    var body: some View {
        NavigationView {
            VStack {
                HStack {
                    TextField("New task", text: $newTodo)
                        .textFieldStyle(RoundedBorderTextFieldStyle())

                    Button(action: addNewTodo, label: {
                        Text("Add")
                    })
                }
                .padding()

                List {
                    ForEach(allTodos, id: \.self) { todo in
                        Text(todo)
                    }
                    .onDelete(perform: deleteTodo)
                }
            }
            .navigationTitle("ToDo List")
            .navigationBarItems(trailing: EditButton())
        }
    }

    private func addNewTodo() {
        guard !newTodo.isEmpty else { return }
        allTodos.append(newTodo)
        newTodo = ""
        UserDefaults.standard.set(allTodos, forKey: "todos")
    }

    private func deleteTodo(at offsets: IndexSet) {
        allTodos.remove(atOffsets: offsets)
        UserDefaults.standard.set(allTodos, forKey: "todos")
    }
}

struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
        ContentView()
    }
}

Upon using this code to build the app, I managed to create a single-page app as shown in the image below.

  1. Write UI Tests: Copilot also generated UI test code, but upon implementing it, the deletion test failed. To make it work, I manually intervened and added the necessary code.

swiftCopy code

// Generated UI Test Code
import XCTest

class ToDoListUITests: XCTestCase {

    var app: XCUIApplication!

    override func setUp() {
        super.setUp()
        continueAfterFailure = false
        app = XCUIApplication()
        app.launch()
    }

    func testAddingTask() {
        let newTaskField = app.textFields["New task"]
        newTaskField.tap()
        newTaskField.typeText("Test Task")

        let addButton = app.buttons["Add"]
        addButton.tap()

        let firstTask = app.staticTexts["Test Task"]
        XCTAssertTrue(firstTask.exists)
    }

    func testDeletingTask() {
        let newTaskField = app.textFields["New task"]
        newTaskField.tap()
        newTaskField.typeText("Test Task")

        let addButton = app.buttons["Add"]
        addButton.tap()

        let firstTask = app.staticTexts["Test Task"]
        XCTAssertTrue(firstTask.exists)

        // Manual Intervention to Make Deletion Test Work
        let app = XCUIApplication()
        let sidebarCollectionView = app.collectionViews["Sidebar"]
        let testStaticText = sidebarCollectionView.staticTexts["Test Task"]
        
        let todoListNavigationBar = app.navigationBars["ToDo List"]
        todoListNavigationBar.buttons["Edit"].tap()
        sidebarCollectionView.cells.otherElements.containing(.image, identifier:"minus.circle.fill").firstMatch.tap()
        sidebarCollectionView.buttons["Delete"].tap()
        todoListNavigationBar.buttons["Done"].tap()
    }
}

Snapshot :

Conclusion: While the generated tests worked to some extent, with the addition of human intervention, the second test was made functional. This highlights the current limitations of AI in fully automating test code generation and the importance of human oversight in refining and improving automated processes. :rocket::robot:

To be honest, It took me about an Hour to do everything from scratch, I do write complex tests involving live mocks in XCUITest on a day to day basis. It was really intriguing to see how AI can catch up to human intelligence with right training.

Cheers All :beers:

4 Likes

I write my automated tests in Pycharm, and a colleague recently introduced me to an extension called Codium. It’s really impressive how intuitive it is - it auto suggests code for me as I write (it’s often spot on) and can also refactor and write comments. If you’re using Pycharm, give it a go!

6 Likes

Thank you Rahul for the invitaion🙌 I am happy to have the chance to collaborate with testers out of my circle and even from a different region, which was a very nice experience sharing ideas overseas.

During our trial, we also faced a problem of the word limit when generating the wordpress code for the checklist, which the code always stops at a partial script and stop generating for the remaining content, as the script is too long for it.

After a few trials we turned to break down the code generation by section of our checklist and finish it in a few prompts instead of one.

This may also be a common problem when generating test cases, but when generating codes the “Please continue” magic won’t work that seamlessly as there are more dependencies in the scripts.


Update (15 Mar):

I’ve another trial with Copilot today, and after telling the requirements, I used two prompts to make it print multiple scripts in separate messages to avoid cropping by word limit:

  1. Please show me the project structure

  2. Please print all the scripts in this project one by one, if cannot print all in one message, ask me to type “continue the script” in next message, and continue printing the remaining scripts until all are printed.

  3. And it works by keep asking me to say “continue the script” until finishing the whole project.

3 Likes

Happy Friday all, a day behind on this one.
So a small cheat here. As I am moving code to .Net 8/c#12 shortly, I bought a very good book “Refactoring with c#” in which the author Matt Eland has some good descriptions:

“There is no intelligent understanding built into an LLM. These models do not think or have thoughts
of their own, but rather use mathematics to identify similarities between the text they receive and the
large volumes of text the model was trained on.
While LLM systems may seem eerily intelligent at times, this is because they are emulating the intelligence
of the authors of the various books, blog posts, tweets, and other materials they’ve been trained on.
GitHub Copilot uses an LLM called Codex. The Codex model is produced by OpenAI and was trained
not on blog posts or tweets but on open-source software repositories.
This means that when you type something into your editor, the text you type can be used as a prompt
to predict the next line of code you might type. This is very similar to how Google search predicts the
next few words in a search term or how ChatGPT generates textual replies”

I already use C-Pilot but had not used Co-Pilot Chat.
Following what the author went through I found some inconsistencies with Co-Pilot Chat

When asking exactly the same question, Generate a list of 10 Random numbers
I got a slightly different outcome. Slight but enough to term it as inconsistent

I adjusted this to what is detailed in the book so i could then ask the next question to Chat, How would you improve his code ?
The author had purposely made the code buggy(instantiating Random from with in a loop), poor readability with variable names and stipulated that this should be 10 random numbers.
While the bug was found and advised to be moved outside of the loop, there was no attempt to make he variable names meaningful or set the list to 10.
Depending on your viewpoint, the code also had a for loop and I would have expected a prompt to make this a foreach on a collection.

as I say I use Co-Pilot and find it very useful. It is a bit flaky at times but overall is impressive for simple tasks.
Co-Pilot Chat, not so impressive . It did highlight to move Random outside the loop as a fix(would have been good to have been highlighted as a potential bug) but seemed to ignore Clean Code guidelines.

Still rather impressed with Co-Pilot and I am sure they will iron out any shortfalls over time.

2 Likes