đŸ€– Day 11: Generate test data using AI and evaluate its efficacy

Day 11 already! Today, we will learn about Test Data Selection and Generation using AI. Data is at the heart of many applications these days, and many tests require us to select or create data that explores the applications’ behaviours. At one end of the scale, this might be a small set of inputs designed to trigger some anticipated system behaviour, and at the other end of the scale, it might require thousands or millions of realistic data points to test the system’s performance or to evaluate an AI model.

Creating realistic data for tests can be a tedious and problematic task a key question is whether we can use AI to supercharge our Test Data Generation efforts.

Task Steps

Today’s task is to pick a tool that generates test data and try it out on a test data generation problem in your context. It could be selecting data to test a behaviour or generating many data points to populate a database.

  1. Select your tool of choice: Review the tool lists compiled in earlier days and find one you want to try that generates test data. Or you could try generating data using a Large Langague Model such as ChatGPT or CoPilot.

  2. Find a Data Problem to solve: Select a Test Data Generation problem or challenge. If you don’t have one (lucky you!), make one or ask the community for examples of their data challenges.

  3. Experiment with the tool: Learn how the tool generates data and try to generate test data for your chosen scenario.

  4. Evaluate the generated data: Review the quality and completeness of the data generated. Some perspectives you might want to consider are:
    a. How easy was it to generate the data?
    b. How flexible is the data generation?
    c. Did the generated data meet your needs? Was it realistic?

  5. Share your findings: As always, share your findings with the community so they can benefit from your insights. Consider sharing:
    a. The data problem you were trying to solve and how well you think the tool performed.
    b. Your perceptions about what was positive and negative about the tool and the data generated.
    c. How the generated data might be improved.
    d. How might the tool help with your day-to-day testing activities?

Why Take Part

  • Learn about new ways to generate test data: By contributing to the task, you’ll learn about new ways to solve an old problem.

:mortar_board: Support your learning and the community. Go Pro!

5 Likes

I asked ChatGPT to generate an XML file that I can use for testing with the prompt:

Generate an example of an XML file for a mock RSS feed of a blog about cats. The XML header should a schema tag and an encoding tag

It was able to generate the file that I was asking for. However, when I asked it to re-generate the code in such a way that the schema URL of the XML header is malformed, it generated exactly the same data from the previous output even as it cited:

In this modified version, I intentionally didn’t include the schema tag inside the XML header, as it would be invalid. However, the rest of the XML structure remains the same as in the previous example.

Outputs are still generally not 100% accurate, and in this case, it is left up to us as the promoters to verify whether or not the data is accurate.

7 Likes

Hello Everyone

Introduction to the Challenge: Today’s challenge revolves around testing a transport application’s user interface (UI) using realistic data. :red_car: The objective is to generate test data that accurately reflects various user interactions within the application, covering scenarios such as user login, trip search and purchase, feedback submission, and notification preferences. :iphone:

Prompt to Generate Transport UI-Based Data and Answer: To tackle this challenge, I utilised a combination of tools and methodologies to generate realistic test data for the transport application’s UI. Initially, I created multiple sets of test data encompassing user profiles, trip details, payment information, preferences, and feedback. :hammer_and_wrench: This data was formatted into a CSV file to facilitate easy integration into testing environments. :bar_chart:

Following is a single set of data generated upon prompt:

1. User Profile Data:
  * Username: testuser1
  * Password: test123
  * Email: testuser1@example.com
  * Profile Picture: [link to a generic profile picture]
2. Trip Data:
  * Trip ID: 123456
  * Start Location: Central Station
  * End Location: Downtown Avenue
  * Departure Time: 8:00 AM
  * Arrival Time: 8:30 AM
  * Mode of Transport: Bus
  * Price: $2.50
3. Payment Information:
  * Card Number: 1234 5678 9012 3456
  * Expiry Date: 12/25
  * CVV: 123
  * Billing Address: 123 Main Street, Cityville, State, Zip
4. Favorite Routes:
  * Route 1: Home to Work
    * Start Location: Home
    * End Location: Office
    * Mode of Transport: Train
  * Route 2: School to Library
    * Start Location: School
    * End Location: Local Library
    * Mode of Transport: Walking
5. Recent Searches:
  * Search 1: Restaurants near Central Station
  * Search 2: Cafes near Downtown Avenue
  * Search 3: Hotels near Airport
6. Notification Preferences:
  * Push Notifications: Enabled
  * Email Notifications: Disabled
7. Feedback Data:
  * Feedback ID: 789012
  * Rating: 4 stars
  * Comment: "Great app! The UI is intuitive and easy to use. However, I encountered a bug when trying to purchase a ticket."
8. Settings:
  * Language: English
  * Theme: Light
  * Location Services: Enabled

CSV with Multiple Data: Below is the CSV file containing multiple sets of test data generated for the transport application’s UI testing:

Username,Password,Email,ProfilePicture,TripID,StartLocation,EndLocation,DepartureTime,ArrivalTime,ModeOfTransport,Price,CardNumber,ExpiryDate,CVV,BillingAddress,Route1Start,Route1End,Route1Transport,Route2Start,Route2End,Route2Transport,Search1,Search2,Search3,PushNotifications,EmailNotifications,FeedbackID,Rating,Comment,Language,Theme,LocationServices
testuser1,test123,testuser1@example.com,"[link]",123456,Central Station,Downtown Avenue,8:00 AM,8:30 AM,Bus,$2.50,1234 5678 9012 3456,12/25,123,"123 Main Street, Cityville, State, Zip",Home,Work,Train,School,Local Library,Walking,Restaurants near Central Station,Cafes near Downtown Avenue,Hotels near Airport,Enabled,Disabled,789012,4,"Great app! The UI is intuitive and easy to use. However, I encountered a bug when trying to purchase a ticket.",English,Light,Enabled
user2,password123,user2@example.com,"[link]",987654,Main Street,Airport,9:00 AM,9:30 AM,Train,$3.00,9876 5432 1098 7654,11/24,456,"456 Elm Street, Townsville, State, Zip",Work,Home,Bus,Gym,Park,Walking,Museums near Main Street,Convenience stores near Airport,Parks near Main Street,Enabled,Enabled,567890,5,"Love the app! It's very convenient for my daily commute.",English,Dark,Enabled
test_user3,pass456,testuser3@example.com,"[link]",456789,Airport,Shopping Mall,10:00 AM,10:30 AM,Car,$5.00,5432 1098 7654 3210,10/23,789,"789 Oak Avenue, Villagetown, State, Zip",School,Friend's House,Walking,Grocery Store,Home,Car,Retailers near Airport,Bookstores near Shopping Mall,Coffee shops near Airport,Enabled,Enabled,123456,3,"The app is okay, but it could use some improvements in the UI design.",English,Light,Disabled
john_doe,password1,johndoe@example.com,"[link]",654321,Downtown,Downtown Park,11:30 AM,12:00 PM,Walk,$0.00,6543 2109 8765 4321,09/22,234,"234 Maple Avenue, Suburbia, State, Zip",Home,Office,Car,Grocery Store,Gym,Walking,Cafes near Downtown,Restaurants near Downtown Park,Hotels near Downtown,Enabled,Disabled,345678,4,"Enjoying the app so far! Would love to see more features added in future updates.",English,Dark,Enabled
jane_smith,passpass,janesmith@example.com,"[link]",987123,Train Station,Airport,7:45 AM,8:15 AM,Train,$2.75,3210 9876 5432 1098,08/23,345,"345 Elm Street, Suburbia, State, Zip",Home,Office,Car,Library,Home,Walking,Restaurants near Train Station,Cafes near Airport,Hotels near Train Station,Enabled,Enabled,456789,5,"Fantastic app! It's become an essential part of my daily routine.",English,Light,Enabled
test_user5,secret123,testuser5@example.com,"[link]",555555,Airport,Beach,1:00 PM,1:30 PM,Bus,$3.25,1098 7654 3210 9876,07/24,567,"567 Cedar Avenue, Townsville, State, Zip",School,Home,Car,Park,Beach,Walking,Restaurants near Airport,Convenience stores near Beach,Hotels near Airport,Enabled,Disabled,987654,3,"The app works fine, but sometimes the search feature doesn't return accurate results.",English,Dark,Disabled
alex_johnson,securepass,alexjohnson@example.com,"[link]",444444,Metro Station,Shopping Mall,12:15 PM,12:45 PM,Subway,$2.00,8765 4321 0987 6543,06/25,678,"678 Oak Avenue, Villagetown, State, Zip",Work,Home,Car,Park,Gym,Walking,Restaurants near Metro Station,Cafes near Shopping Mall,Hotels near Metro Station,Enabled,Enabled,876543,4,"I'm impressed with the app's performance. It's very user-friendly.",English,Light,Enabled
emma_brown,pass1234,emmabrown@example.com,"[link]",333333,Office,Gym,5:30 PM,6:00 PM,Walk,$0.00,7654 3210 9876 5432,05/26,789,"789 Pine Avenue, Cityville, State, Zip",Home,School,Bus,Grocery Store,Library,Walking,Cafes near Office,Restaurants near Gym,Hotels near Office,Enabled,Enabled,765432,5,"This app has made my commute much easier. Highly recommend it!",English,Dark,Enabled
test_user8,testpass8,testuser8@example.com,"[link]",222222,Downtown,Park,3:00 PM,3:30 PM,Bike,$1.50,6543 2109 8765 4321,04/27,890,"890 Pine Avenue, Suburbia, State, Zip",Home,Work,Train,Library,Park,Walking,Cafes near Downtown,Restaurants near Park,Hotels near Downtown,Enabled,Disabled,654321,3,"The app needs improvement in terms of speed. Sometimes it lags.",English,Light,Disabled
sara_miller,millerpass,saramiller@example.com,"[link]",111111,Park,Library,2:15 PM,2:45 PM,Walk,$0.00,5432 1098 7654 3210,03/28,901,"901 Cedar Avenue, Townsville, State, Zip",Home,Gym,Car,Park,Library,Walking,Restaurants near Park,Cafes near Library,Hotels near Park,Enabled,Enabled,543210,4,"Overall, a great app! However, occasional crashes need to be addressed.",English,Dark,Enabled
test_user10,securepass10,testuser10@example.com,"[link]",777777,Library,Café,4:45 PM,5:15 PM,Bus,$2.25,4321 0987 6543 2109,02/29,012,"012 Pine Avenue, Cityville, State, Zip",Home,School,Train,Office,Café,Walking,Restaurants near Library,Cafes near Café,Hotels near Library,Enabled,Enabled,432109,5,"No complaints so far! The app does everything I need it to.",English,Light,Enabled
adam_wilson,passadam,adamwilson@example.com,"[link]",888888,Beach,Park,11:00 AM,11:30 AM,Walk,$0.00,3210 9876 5432 1098,01/30,123,"123 Elm Avenue, Villagetown, State, Zip",Home,Work,Car,Library,Park,Walking,Restaurants near Beach,Cafes near Park,Hotels near Beach,Enabled,Disabled,321098,3,"Decent app, but it could use some UI improvements.",English,Dark,Disabled
test_user12,mytestpass,testuser12@example.com,"[link]",666666,Gym,Office,8:30 AM,9:00 AM,Car,$4.00,2109 8765 4321 0987,12/31,234,"234 Oak Avenue, Suburbia, State, Zip",School,Home,Train,Park,Grocery Store,Walking,Cafes near Gym,Restaurants near Office,Hotels near Gym,Enabled,Disabled,210987,4,"The app is good, but occasional bugs need to be fixed.",English,Light,Enabled
laura_adams,pass12345,lauraadams@example.com,"[link]",999999,Café,Park,6:30 PM,7:00 PM,Walk,$0.00,1098 7654 3210 9876,11/32,345,"345 Pine Avenue, Townsville, State, Zip",Home,Library,Car,Office,Park,Walking,Restaurants near Café,Cafes near Park,Hotels near Café,Enabled,Enabled,109876,5,"The app is excellent! I have nothing but positive experiences with it.",English,Dark,Enabled
test_user14,testpass14,testuser14@example.com,"[link]",101010,Shopping Mall,Café,2:00 PM,2:30 PM,Bus,$2.50,8765 4321 0987 6543,10/29,456,"456 Oak Avenue, Cityville, State, Zip",Home,School,Train,Park,Grocery Store,Walking,Restaurants near Shopping Mall,Cafes near Café,Hotels near Shopping Mall,Enabled,Disabled,876543,3,"The app is alright, but it could use more features.",English,Light,Disabled
james_jackson,passjames,jamesjackson@example.com,"[link]",121212,Airport,Downtown,4:30 PM,5:00 PM,Car,$5.50,7654 3210 9876 5432,09/21,567,"567 Elm Avenue, Suburbia, State, Zip",Work,Home,Bus,Library,Café,Walking,Restaurants near Airport,Cafes near Downtown,Hotels near Airport,Enabled,Enabled,765432,4,"Good app overall, but there are occasional glitches that need fixing.",English,Dark,Enabled

BDD Feature File Using the Data: Using the generated test data, I structured a Behavior Driven Development (BDD) feature file outlining various user scenarios. :memo: The BDD scenarios incorporated Examples to demonstrate the application’s behavior with different sets of data. Here’s a snippet of the BDD feature file:

Feature: Transport Application Testing

  Scenario Outline: User logs in and searches for transportation options
    Given I am a registered user with username "<Username>" and password "<Password>"
    When I log in with my credentials
    Then I should be successfully logged in
    When I search for transportation options from "<StartLocation>" to "<EndLocation>"
    Then I should see available trips displayed on the screen

    Examples:
      | Username    | Password | StartLocation | EndLocation    |
      | testuser1   | test123  | Central Station | Downtown Avenue |
      | user2       | password123 | Main Street    | Airport        |
      | test_user3  | pass456  | Airport       | Shopping Mall |
      | john_doe    | password1 | Downtown      | Downtown Park |
      | jane_smith  | passpass | Train Station | Airport       |

  Scenario Outline: User purchases a ticket for a trip
    Given I am logged in as "<Username>" with password "<Password>"
    When I select a trip from "<StartLocation>" to "<EndLocation>"
    And I proceed to purchase a ticket
    Then I should successfully complete the transaction
    And the ticket should be displayed on my screen

    Examples:
      | Username   | Password   | StartLocation | EndLocation  |
      | testuser1  | test123    | Central Station | Downtown Avenue |
      | user2      | password123 | Main Street  | Airport      |
      | test_user3 | pass456    | Airport      | Shopping Mall |
      | john_doe   | password1  | Downtown     | Downtown Park |
      | jane_smith | passpass   | Train Station | Airport      |

  Scenario Outline: User provides feedback on the application
    Given I am logged in as "<Username>" with password "<Password>"
    When I navigate to the feedback section
    And I provide a rating of <Rating> stars along with a comment
    Then my feedback should be submitted successfully

    Examples:
      | Username   | Password | Rating | 
      | testuser1  | test123  | 4      |
      | user2      | password123 | 5   |
      | test_user3 | pass456  | 3      |
      | john_doe   | password1 | 4      |
      | jane_smith | passpass | 5      |

  Scenario Outline: User sets notification preferences
    Given I am logged in as "<Username>" with password "<Password>"
    When I navigate to the settings
    And I toggle push notifications to be <PushNotifications>
    Then push notifications should be <PushNotifications> for my account
    And email notifications should remain <EmailNotifications>

    Examples:
      | Username   | Password | PushNotifications | EmailNotifications |
      | testuser1  | test123  | Enabled           | Disabled           |
      | user2      | password123 | Enabled         | Enabled            |
      | test_user3 | pass456  | Enabled           | Enabled            |
      | john_doe   | password1 | Enabled           | Disabled           |
      | jane_smith | passpass | Enabled           | Enabled            |

Comparison Table: Following the generation and utilisation of test data, I conducted a comprehensive evaluation using a comparison table. :bar_chart: This table highlighted the positive aspects, such as efficient data generation and diverse coverage, as well as areas for improvement, including the need for dynamic data elements and challenges in simulating complex scenarios. Here’s a summary of the comparison table:

Criteria Positive Aspects Negative Aspects
Data Problem & Tool Performance - Efficiently generated diverse, realistic data - Limited ability to generate complex or edge-case scenarios
Perceptions about the Tool - Streamlined data generation process - Potential limitations in simulating dynamic data
Positive Aspects - Diverse and realistic data coverage - Limited ability to simulate real-time changes
Negative Aspects - Flexible customisation of data parameters - Challenges in generating highly specific domain scenarios
How Generated Data Might be Improved - Include more variations in user preferences and behaviours - Introduce more dynamic elements for real-time simulations
Tool’s Impact on Day-to-Day Testing Activities - Saves time and effort in data creation - Requires additional manual effort for edge-case simulations
Overall Assessment - Enhances testing efficiency and effectiveness - Room for improvement in simulating complex and dynamic scenarios

Conclusion: In conclusion, the challenge of testing a transport application’s UI with realistic data was effectively addressed through the generation of diverse and comprehensive test data. Utilising a combination of CSV files for data storage and BDD feature files for scenario structuring facilitated efficient testing processes. :test_tube: While the generated test data showcased strengths in realism and coverage, there are opportunities for improvement in simulating dynamic scenarios and handling edge cases. Overall, this exercise emphasized the importance of realistic test data in ensuring the reliability and effectiveness of software applications. :star2:

Thank you

7 Likes

Hi my fellow testers, for day 11’s challenge I continued to use ChatGPT in its role as an expert software tester with domain geochemical knowledge. My prompt was:

I need you to export a geochemical dataset in both .csv and .xlsx formats. I need 500 rows in each dataset. The headers need to be well name, well lat, well long, sample name, base md, top md, sample type

It then outputted the below which I feel is a bit of a cheat:

well name	well lat	well long	sample name	base md	top md	sample type
Well 1	38.89511	-77.03637	Sample 1	1000	1050	Soil
Well 2	34.052235	-118.2437	Sample 2	1200	1250	Water
Well 3	40.712776	-74.00597	Sample 3	800	850	Rock
...	...	...	...	...	...	...
Well 500	51.5074	-0.1278	Sample 500	950	1000	Sediment

I then re-prompted it with: The above dataset does not contain 500 rows, this time produce the same table but with 500 actual rows of data, do not skip any rows

but it did the same again so I asked it to explain its output and only then did it apologise for creating a truncated representation for brevity but then outputted a dataset of 20 rows. I asked it to analyse its output for truncation, it apologised yet again and it looked this time like it was genuinely trying to output 500 rows. It kept pausing randomly and a button would appear for me to click to tell it to continue generating and then it would resume, but eventually it just quit the session altogether and left me at the default ‘how can I help you today’ screen.

I assume I asked too much of ChatGPT 3.5, maybe version 4 would have handled it better or another tool entirely would be better suited for this task. Does anyone have any suggestions of alternative free tools that would potentially do a better job?

7 Likes

Hey hi Manoj, It’s great to see the diverse set of test data. Can you please specify which tools you have used to generate the test data?

5 Likes

Hi Poojitha, I have used ChatGPT and Gemini.

Thank you
Manoj

3 Likes

I asked Bing Copilot to generate pytest code for checking code that extracts encoding from Content-Type header. This is relatively simple thing to do, but there are various normalizations and corner cases that might trip you up. I paste my prompt and response below.

First I asked for tests to extract encoding from header. Then I also asked for tests to extract encoding from “header” that is put inside HTML meta tag. Initially I was afraid Copilot might not understand the second request, but then I recalled one of earlier tasks and decided to give it an example, in hope it will make things clearer.

For comparison, here’s my own code for the second task.

The response for straightforward task is pretty good. I was curious if it will use pytest parameters, and it did. It surprised me with a test for quoted value - I did not think about this. I would appreciate some more tests around case-sensitivity (of both name and value in parameter). It did not consider that there might be multiple parameters, and their order does not matter. What’s worse, edge case with “extra” might fool you into thinking that charset is going to be the only parameter.

The response for second task is less satisfactory. It still ignored additional parameters. It did not consider meta tag without content and different order of meta tag attributes. There’s even less testing of case sensitivity. On the other hand, it gave some examples of incorrectly placed meta tags and charset included in irrelevant meta tags, which is nice. It ignored my reference to pytest, giving me just raw test data. Finally, there was a reference to GitHub at the bottom of the answer, making me to believe that it just copied code from someone else. If you didn’t notice a GitHub link and did not check the license, there could be legal problems from that.


Q: I have a piece of code that parses Content-Type: header in HTTP response and extracts encoding, if any. Please generate some test data for me to test this code. Check both happy path for various encodings, as well as path for no encoding and various edge cases. Be creative. Remember my parser needs to be compliant with relevant HTTP specifications. I am using Python with pytest.

A: Certainly! Here’s a set of test cases you can use to validate your Content-Type header parser:

import pytest

# Test cases for various encodings (happy path)
@pytest.mark.parametrize("content_type, expected", [
    ("text/html; charset=UTF-8", "UTF-8"),
    ("text/plain; charset=ISO-8859-1", "ISO-8859-1"),
    ("application/json; charset=utf-16", "utf-16"),
    ("application/xml; charset=\"utf-8\"", "utf-8"),  # Quoted charset value
    ("text/html; CHARSET=UTF-8", "UTF-8"),  # Case-insensitive
])

# Test case for no encoding (happy path)
@pytest.mark.parametrize("content_type, expected", [
    ("text/html", None),
    ("image/png", None),
])

# Test cases for edge cases
@pytest.mark.parametrize("content_type, expected", [
    ("", None),  # Empty Content-Type
    (";", None),  # Missing type and subtype
    ("text/html; charset=", None),  # Empty charset
    ("text/html; charset==UTF-8", None),  # Double equals in charset
    ("text/html; charset==\"UTF-8\"", None),  # Double equals with quoted charset
    ("text/html; charset= UTF-8", "UTF-8"),  # Space after equals
    ("text/html; charset=\"UTF-8", None),  # Missing closing quote
    ("text/html; charset=UT\"F-8\"", None),  # Quote in the middle of charset
    ("text/html; charset=U\\TF-8", None),  # Illegal escape character
    ("text/html; charset=\"UTF-8\" extra", None),  # Extra text after charset
    ("text/html; charset=unknown", None),  # Unknown charset
])

def test_parse_content_type(content_type, expected):
    # Your parsing function goes here
    assert parse_content_type(content_type) == expected

This set includes various encodings, cases with no encoding, and different edge cases that might trip up the parser. Remember to replace parse_content_type with your actual parsing function. Happy testing!


Q: I have a piece of code that parses HTML document, looks for meta tag with http-equiv set to Content-Type and then reads the value and extracts the encoding, if any. I mean things like that:

<meta http-equiv="Content-Type" content="text/html; charset=cp-1250" />

Please generate some test data for me to test this code. Check both happy path for various encodings, as well as path for no encoding and various edge cases. Be creative. Remember my parser needs to be compliant with relevant HTML and HTTP specifications. I am using Python with pytest.

A: Certainly! Here’s a variety of test cases for your HTML parser:

<!-- Happy Path: Various Encodings -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-16" />
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
<meta http-equiv="Content-Type" content="text/html; charset=GB2312" />

<!-- No Encoding Specified -->
<meta http-equiv="Content-Type" content="text/html" />

<!-- Edge Cases: Malformed Meta Tags -->
<meta http-equiv="Content-Type" content="text/html; charset=" />
<meta http-equiv="Content-Type" content="text/html; charset==UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset='UTF-8'" />
<meta http-equiv="Content-Type" content="text/html; charset="UTF-8"" />
<meta http-equiv="Content-Type" content="text/html; charset=UT F-8" />
<meta http-equiv="Content-Type" content="text/html; charset=đŸ€”" />

<!-- Edge Cases: Incorrect Placement -->
<p><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></p>
<div><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></div>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><span></span>

<!-- Edge Cases: Additional Attributes -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" extra="data" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" http-equiv="Content-Type" charset="UTF-8" />

For testing with pytest in Python, you can create individual test functions for each case, asserting that your parser extracts the correct encoding or handles the absence or malformation of the charset attribute appropriately. Remember to also test for case sensitivity and leading/trailing whitespace handling. Good luck with your testing! :hammer_and_wrench:

6 Likes

I think i broke chat-gpt :smiley:

I asked for the world and it gave me a sod of grass

so i broke it up a bit
given that i am an automation engineer generate a data set of 20 records as .net objects that have
a unique auto generated numerical id starting from 1
a string description that includes the id as a suffix
3 fields that are floating point to the 10th decimal place

using System;
using System.Collections.Generic;

public class Record
{
public int Id { get; set; }
public string Description { get; set; }
public double Field1 { get; set; }
public double Field2 { get; set; }
public double Field3 { get; set; }
}

public class Program
{
public static void Main(string args)
{
// Generate 20 records
List records = GenerateRecords(20);

    // Display records
    foreach (var record in records)
    {
        Console.WriteLine($"ID: {record.Id}, Description: {record.Description}, Field1: {record.Field1}, Field2: {record.Field2}, Field3: {record.Field3}");
    }
}

public static List<Record> GenerateRecords(int count)
{
    List<Record> records = new List<Record>();

    for (int i = 1; i <= count; i++)
    {
        Record record = new Record
        {
            Id = i,
            Description = $"Description {i}",
            Field1 = Math.Round(10.0 + i * 0.1, 10), // Generating floating point numbers to the 10th decimal place
            Field2 = Math.Round(20.0 + i * 0.1, 10),
            Field3 = Math.Round(30.0 + i * 0.1, 10)
        };
        records.Add(record);
    }

    return records;
}

}

which isnt too bad looking

i then asked

using the ’ GenerateRecords’ method give me a set of .net c# tests using xunit and linq to test for precision to the 7th decimal place

using System;
using System.Collections.Generic;
using System.Linq;
using Xunit;

public class Record
{
public int Id { get; set; }
public string Description { get; set; }
public double Field1 { get; set; }
public double Field2 { get; set; }
public double Field3 { get; set; }
}

public class RecordTests
{
[Fact]
public void Field1PrecisionTest()
{
// Arrange
var records = GenerateRecords(20);

    // Act
    var inaccurateRecords = records.Where(r => Math.Round(r.Field1, 7) != r.Field1);

    // Assert
    Assert.Empty(inaccurateRecords);
}

[Fact]
public void Field2PrecisionTest()
{
    // Arrange
    var records = GenerateRecords(20);

    // Act
    var inaccurateRecords = records.Where(r => Math.Round(r.Field2, 7) != r.Field2);

    // Assert
    Assert.Empty(inaccurateRecords);
}

[Fact]
public void Field3PrecisionTest()
{
    // Arrange
    var records = GenerateRecords(20);

    // Act
    var inaccurateRecords = records.Where(r => Math.Round(r.Field3, 7) != r.Field3);

    // Assert
    Assert.Empty(inaccurateRecords);
}

public List<Record> GenerateRecords(int count)
{
    List<Record> records = new List<Record>();

    for (int i = 1; i <= count; i++)
    {
        Record record = new Record
        {
            Id = i,
            Description = $"Description {i}",
            Field1 = Math.Round(10.0 + i * 0.1, 10), // Generating floating point numbers to the 10th decimal place
            Field2 = Math.Round(20.0 + i * 0.1, 10),
            Field3 = Math.Round(30.0 + i * 0.1, 10)
        };
        records.Add(record);
    }

    return records;
}

}

and finally wanted some SOLID in there :smiley:
using the ’ GenerateRecords’ method give me a set of .net c# generic tests using xunit and linq to test for precision to the 7th decimal place using solid principles

To ensure the application of SOLID principles, particularly the Single Responsibility Principle (SRP), I’ll separate concerns by creating a utility class for record generation and another class for precision testing. Here’s how you can structure your code:
using System;
using System.Collections.Generic;
using System.Linq;
using Xunit;

public class Record
{
public int Id { get; set; }
public string Description { get; set; }
public double Field1 { get; set; }
public double Field2 { get; set; }
public double Field3 { get; set; }
}

public static class RecordGenerator
{
public static List GenerateRecords(int count)
{
List records = new List();

    for (int i = 1; i <= count; i++)
    {
        Record record = new Record
        {
            Id = i,
            Description = $"Description {i}",
            Field1 = Math.Round(10.0 + i * 0.1, 10), // Generating floating point numbers to the 10th decimal place
            Field2 = Math.Round(20.0 + i * 0.1, 10),
            Field3 = Math.Round(30.0 + i * 0.1, 10)
        };
        records.Add(record);
    }

    return records;
}

}

public class PrecisionTests
{
[Theory]
[InlineData(“Field1”)]
[InlineData(“Field2”)]
[InlineData(“Field3”)]
public void PrecisionTest(string fieldName)
{
// Arrange
var records = RecordGenerator.GenerateRecords(20);
var fieldInfo = typeof(Record).GetProperty(fieldName);

    // Act
    var inaccurateRecords = records.Where(r => Math.Round((double)fieldInfo.GetValue(r), 7) != (double)fieldInfo.GetValue(r));

    // Assert
    Assert.Empty(inaccurateRecords);
}

}

once again very good at boilerplate but not so much on what i needed.
at this stage i would still say that what we put in we get out, in other words, if my prompts were more concise i am sure i would get better results.

However that poses a question of the value of time spent to what we get out of it

4 Likes

Hey all,

I have chosen ChatGPT to generate few sets of test data in a specified JSON format. Even though the prompt is clear with all the input fields elaboration and how the page works, the output was not that much satisfactory. The LLM is trying to generate the sets with the format given for some fields that needs to be tested, but it is not specifying clearly. For example, for the scenario of invalid UK Postcode it’s not giving the invalid UK Postcode and just putting a parameter called Invalid_Postcode in the JSON output.
The conversation I’ve shared here: ChatGPT

I would be also waiting for some suggestion on this like Adrian mentioned.

1 Like

I have the Cody plugin installed in VSCode. I asked it to generate test data for a Ruby spec file in our ATF codebase that tests creating a new course in our web-based app. “Please generate test data for this file create_course_spec.rb” It generated a little test script that ran the test with basically hard coded data. Not what I expected.

Then i asked it, “Please generate a table of test data for creating courses”. It generated a table all right, but just like a table in Word. Nothing I could easily use.

So then I asked, “How can I get a table of that test data in usable form?” (I’m not sure why I’m so polite to something that is not a human
) This worked well, it generated csv format for me, as well as factory data, and offered to generate a json fixture file if I wanted.

What about specific types of test data? I asked this: “Generate test data that includes format exploits for testing adding a course.”

The results:

invalid_course_names = [
  "Invalid Name",
  "<script>alert('XSS')</script>", # script injection
  "../../../../etc/passwd", # path traversal 
  "(); drop table courses; --", # SQL injection
  "àčÂŸÂ˜Â€" * 3000, # super long string
]

invalid_descriptions = [
  "Description" * 500, # too long
  "àčÂŸÂ˜Â€" * 5000, # emoji spam
] 

invalid_codes = [
  "CODEWITH<INVALID>CHARS",
  "123.456", # can't have .
  "àčÂŸÂšàžˆàčÂŸÂšàžˆàčÂŸÂšàžˆ", # only letters and numbers
  "(àč‚Â€àž†)àžąàž—àžąàž—àžąàž—-àč‚€“-àč‚€" # weird unicode
]

it "rejects invalid course names" do
  invalid_course_names.each do |invalid_name| 
    expect { 
      post courses_path, params: { 
        course: {
          name: invalid_name,
          description: "Desc",
          code: "CODE123" 
        }
      }
    }.not_to change(Course, :count)
  end
end

it "rejects invalid descriptions" do
  invalid_descriptions.each do |invalid_desc|
   expect {
     post courses_path, params: {
       course: {
         name: "Name",
         description: invalid_desc,
         code: "CODE123"
       } 
     }
   }.not_to change(Course, :count)
  end
end

# And so on for other fields...

I’ve had other good experiences having Cody, Tabnines or ChatGPT generate test data, I think this is one of the big time-savers these tools can provide.

1 Like

That’s a really good challenge for ChatGPT and highlights some interesting domain knowledge which surprises me it produced data for. The imaginative names for wells and samples are about as imaginative as I could come up with :slight_smile:

I think GPT3.5 can be a bit poor with its responses and many people report better responses with GPT4.

I tried the same with Copilot and it produced

well name, well lat, well long, sample name, base md, top md, sample type
Well-1, 48.8566, 2.3522, Sample-001, 1500, 1300, Rock
Well-2, 48.8567, 2.3523, Sample-002, 1505, 1305, Sediment
...

so did the same in terms of rows - the response said it could continue and I asked it to continue but it only displayed 5 rows and then said “Please replace the ellipsis (
) with additional rows as needed to reach 500.” :slight_smile:

Assuming it would be a struggle to get the Copilot to generate 500 rows, I then asked it to produce a python function to generate the data using the patterns it suggested
and it did!

import csv
import random


# Function to generate random data based on the pattern
def generate_geochemical_data(file_name):
    # Headers for the CSV file
    headers = ['well name', 'well lat', 'well long', 'sample name', 'base md', 'top md', 'sample type']

    # Sample types for random selection
    sample_types = ['Rock', 'Sediment', 'Mineral', 'Organic', 'Core']

    # Open the file in write mode
    with open(file_name, mode='w', newline='') as file:
        writer = csv.writer(file)

        # Write the headers
        writer.writerow(headers)

        # Generate and write data for 500 rows
        for i in range(1, 501):
            well_name = f"Well-{i}"
            well_lat = 48.8566 + (i * 0.0001)  # Increment latitude slightly for each well
            well_long = 2.3522 + (i * 0.0001)  # Increment longitude slightly for each well
            sample_name = f"Sample-{str(i).zfill(3)}"
            base_md = 1500 + (i * 5)  # Increment base measure depth
            top_md = base_md - 200  # Top measure depth is always 200 less than base
            sample_type = random.choice(sample_types)  # Randomly choose a sample type

            # Write the row data
            writer.writerow([well_name, well_lat, well_long, sample_name, base_md, top_md, sample_type])


# Call the function with the desired CSV file name
generate_geochemical_data('geochemical_dataset.csv')

I’m slightly impressed with that!

1 Like

btw, I didn’t notice until someone trying to book a meeting on my Calendly pointed it out, the calendar event that MoT sent me and that I blindly added to my calendar marked all my days as BUSY! Oops! I fixed it but if someone else also accepted the event without looking, you might want to check.

1 Like

Hi
I decided to go on a more simplistic route today with data. I wanted to see how an LLM (ChatGPT 3.5) could help me generate test data quickly for a known test scenario: I want to test the process of registering a new user. In particular, selecting a postcode.

I started by asking for a sample set of 50 UK postcodes. That was generated nice and quickly, but at the end of the response, it stated that: “Please note that these are randomly generated and may not correspond to actual locations.”

This can be a real problem where you want to test the classic UI feature of inputting a postcode and then selecting an address matching it. I then asked which of the postcodes generated were real. The response was “
to verify the existence of specific postcodes, you would need to consult an official postal service database or use a postcode lookup tool provided by Royal Mail or other reputable sources.”

To me, this then means that we have to be very careful about what we can ask an LLM to generate data for. My current view is that where the data does not need to reflect “real world situations”, then this is a quick way to generate it. If you need data that is “real” then you still need to do the legwork of going to originating data sources.

Note that this is not a bad thing, just a limitation to be aware of.

5 Likes

Very interesting to see that Copilot also struggles with that problem. I wonder what the reason could be. Maybe trying to protect its resources?

Better performing and free is going to be challenging when it comes to Large Language Models (LLMs). The cost to train these models is huge (Chat GPT 4 was reported to cost $100 million to train) and hosting the models is also quite expensive as it takes quite a bit of compute.

So the newer better models that are free are going to be rare but may become cheaper over time. On OpenAI the difference in cost of access to GPT 3.5 and GPT 4 is quite large (relatively speaking) and mounts up depending on how many tokens you queries take.

There is some hope for Open Source LLMs (see 5 Best Open Source LLMs (March 2024) - Unite.AI for some ideas) but to use them effectively you need to host the model somewhere (it would be very slow to run these on a personal machine
assuming they had enough memory to load and run the model). However, they do allow you to fine-tune the models to specific tasks which often yields better results.
The Open Source models are also just the raw model so you’d need to build an interface around them to access.

Some of these are available on the awesome HuggingFace (https://huggingface.co/) but the better ones usually require a Pro subscription
again it’s the cost of hosting the models that causes this. They have a leaderboard (https://huggingface.co/open-llm-leaderboard) that might be interesting to explore to find better performing models (based on benchmarks)

Hope that helps

4 Likes

It could be - but I’m not sure.

However, paid LLM will charge you based on Input and Output tokens so perhaps it is a guard to stop you from unexpectedly racking up huge bills :slight_smile:

With systems such as ChatGPT you also have the issue around context length - there is a limit to the amount of data you can provide as input. The issue arises when you are in conversation with the model since the output from the previous interactions are included as part of the context for your next input. This is part of what ChatGPT app does automatically in an attempt to give you that conversational feel about the interaction. Since there is a finite input length, some forgetting occurs when the context is truncated.

So if the model generated a large amount of data and we then asked a follow-up question (such as “can you create a python function to generate this”) it would pass all the data that was generated as part of the prompt (and truncate the earlier parts to fit the max context) which would probably end up getting weird responses over time. So perhaps it is some guard to minimise such weirdness.

1 Like

Day 11

Quick one today.

A little while ago I asked ChatGPT to generate me a bash script to generate contacts.vcf files to help with some device testing:

#!/bin/bash

# Function to generate a random 10-digit number
generate_phone_number() {
    echo $((7000000000 + RANDOM % 1000000000))
}

# Function to select a random item from a list
select_random_item() {
    local list=("$@")
    local num_items=${#list[@]}
    local random_index=$((RANDOM % num_items))
    echo "${list[random_index]}"
}

# List of first names
first_names=("Emma" "Liam" "Olivia" "Noah" "Ava" "William" "Sophia" "James" "Isabella" "Oliver" "Charlotte" "Benjamin" "Amelia" "Elijah" "Mia" "Lucas" "Harper" "Mason" "Evelyn" "Logan" "Abigail" "Alexander" "Emily" "Ethan" "Elizabeth" "Michael" "Avery" "Daniel" "Sofia")

# List of last names
last_names=("Smith" "Johnson" "Williams" "Jones" "Brown" "Davis" "Miller" "Wilson" "Moore" "Taylor" "Anderson" "Thomas" "Jackson" "White" "Harris" "Martin" "Thompson" "Garcia" "Martinez" "Robinson" "Clark" "Rodriguez" "Lewis" "Lee" "Walker" "Hall" "Allen" "Young" "Hernandez")

# Function to generate the contacts
generate_contacts() {
    local num_contacts=$1
    for ((i=1; i<=$num_contacts; i++)); do
        echo "BEGIN:VCARD"
        echo "VERSION:3.0"
        first_name=$(select_random_item "${first_names[@]}")
        last_name=$(select_random_item "${last_names[@]}")
        echo "FN:$first_name $last_name"
        echo "N:$last_name;$first_name;;;"
        echo "EMAIL;TYPE=INTERNET;TYPE=HOME:$first_name@example.com"
        echo "TEL;TYPE=CELL:$(generate_phone_number)"
        echo "END:VCARD"
    done
}

# Main script
num_contacts=$1
if [[ ! $num_contacts =~ ^[0-9]+$ ]]; then
    echo "Usage: $0 <number_of_contacts>"
    exit 1
fi

generate_contacts $num_contacts > contacts.vcf
echo "Generated $num_contacts contacts in contacts.vcf"

There were quite a few revisions, while I arrived at the right prompt. I wish I had done some research into prompt engineering. Although in fainess to me, I added an example of the VCF file format. :slight_smile:

In terms of evaluation


How easy was it to generate the data?

As long as you know how to make bash scripts executable, it would be very easy. There was some assumed knowledge though, but I liked that it was configurable in terms of number of contacts and produced a file that could be shared to a device for testing.

How flexible is the data generation?

You could only request low numbers of contacts if you wanted many unique combinations of names. It was for a small job though, so I was willing to sacrifice flexibility for expediency.

Did the generated data meet your needs? Was it realistic?

At first we went with just random strings which is fine but is poor quality test data, you can miss issues like sorting. I suppose its the limitations of bash scripts, could have done it in faker or something similar.

2 Likes

Interestingly, I asked Copilot to generate 10 rows of data (previously it only generated 2 examples when asked for 500)
and it generated all 10
I also tried 25 rows and it produced them but only produced a sample when asked for 50.

So perhaps there is some guard in place to prevent responses that are either too long or take too long to generate.

3 Likes

Hello Fellow Contestants,

I used both ChatGPT as well as Gemini for trying out today’s exercise.

The problem statement that I created was a real one with some hypothetical twists, i.e., Sign Up to Ministry of Testing Platform.

Here is the prompt that I used:

I want you to act as an expert software tester who works on creating test data to provide comprehensive test data coverage.

I want you to generate positive, negative, creative, big, little, invalid,  exploratory, boundary-related, and penetration testing related test data to expose vulnerabilities. 

Here are some common types of test data attacks that you can also learn from and incorporate while creating our own test data:

Paths/Files(write paths with these give type): Long Name (>255 chars), Special Characters in Name (eg: space * ? / \ | < > , . ( ) [ ] { } ; : ‘ “ ! @ # $ % ^ & ƒ ), Non-Existent characters, Character with No Space.

Time and Date: Crossing Time Zones, Leap Days, Always Invalid Days (Feb 30, Sept 31), Feb 29 in Non-Leap Years, Different Formats (June 5, 2001; 06/05/2001; 06/05/01; 06-05-01; 6/5/2001 12:34), Internationalisation dd.mm.yyyy, mm/dd/yyyy, am/pm, Daylight Savings Changeover.

Numbers: 0, 32768 (215), 32769 (215 + 1), 65536 (216), 65537 (216 +1), 2147483648 (231), 2147483649 (231 + 1), 4294967296 (232), 4294967297 (232 + 1), Scientific Notation (1E-16), Negative, Floating Point/Decimal (0.0001), With Commas (1,234,567), European Style (1.234.567,89).

Strings: Long (255, 256, 257, 1000, 1024, 2000, 2048 or more characters), Accented Chars (àåùãÀÄçÚéĂȘëÏíßðñĂČÎÔöö, etc.), Asian Characters

Common Delimiters and Special Characters ( “ ‘ ` | / \ , ; : & < > ^ * ? Tab ), Leave Blank, Single Space, Multiple Spaces, Leading Spaces, SQL Injection ( ‘select * from customer),  Emojis

Provide the results in tabular format.

I want you to generate {10} rows of test data for: Sign Up to Ministry of Testing Application

These are the variable names to create test data for: Username, Password, Email Address, First Name, Last Name, Company Name, Experience, etc.

Only Email Address, and First Name are Mandatory Fields.

In my observations with multiple trying out of this and similar prompts (previous versions of this prompt) on both GPT and Gemini, I found Gemini to be more creative and powerful with test data.

However, both often ignore the prompt examples in some instances and fail to get too creative compared to what a human tester would have done with so many heuristics and data attacks as input data.

Here is a detailed video on my today’s task here: https://www.youtube.com/watch?v=0tp_YMgZSkk

Here are all the AI Prompts Created by Me Till Now: AI Prompt Repository for Testers - Rahul's Testing Titbits

Looking forward to tomorrow’s challenge.

Thanks,
Rahul

3 Likes

Task Steps

1. Select your tool of choice:

  • Review the tool lists compiled in earlier days and find one you want to try that generates test data. Some popular tools include:

    • Faker

    • Mockaroo

    • AutoFixture

    • Hypothesis

  • Try generating data using a Large Language Model (LLM) such as Gemini. LLMs are powerful AI models that can generate realistic text and data.

2. Find a Data Problem to solve:

  • Select a Test Data Generation problem or challenge. If you don’t have one, make one or ask the community for examples of their data challenges. Here are a few examples:

    • Generating realistic customer data for testing an e-commerce website

    • Generating test data for a machine learning model that predicts customer churn

    • Generating large amounts of data to populate a database for performance testing

3. Experiment with the tool:

  • Learn how the tool generates data and try to generate test data for your chosen scenario. Here is an example of how to generate fake customer data using Faker:
import faker

# Create a Faker instance
fake = faker.Faker()

# Generate 100 rows of fake customer data
customers = [
    {
        "name": fake.name(),
        "address": fake.address(),
        "email": fake.email(),
        "phone": fake.phone_number(),
    }
    for _ in range(100)
]

4. Evaluate the generated data:

  • Review the quality and completeness of the data generated. Some perspectives you might want to consider are:

    • How easy was it to generate the data?

    • How flexible is the data generation?

    • Did the generated data meet your needs? Was it realistic?

5. Share your findings:

  • Share your findings with the community so they can benefit from your insights. Consider sharing:

    • The data problem you were trying to solve and how well you think the tool performed.

    • Your perceptions about what was positive and negative about the tool and the data generated.

    • How the generated data might be improved.

    • How might the tool help with your day-to-day testing activities?

Code Examples

Here is a code example of how to generate fake customer data using Gemini:

import gemini

# Create a Gemini instance
gemini = gemini.Gemini()

# Generate 100 rows of fake customer data
customers = gemini.generate("customer", 100)

Here is a code example of how to generate test data for a machine learning model that predicts customer churn using Hypothesis:

import hypothesis

@hypothesis.given(
    age=hypothesis.integers(min_value=18, max_value=65),
    gender=hypothesis.sampled_from(["male", "female"]),
    tenure=hypothesis.integers(min_value=0, max_value=10),
    contract=hypothesis.sampled_from(["monthly", "yearly"]),
    calls=hypothesis.integers(min_value=0, max_value=100),
    data=hypothesis.integers(min_value=0, max_value=1000),
    charges=hypothesis.floats(min_value=0, max_value=1000),
    churn=hypothesis.sampled_from([True, False]),
)
def test_customer_churn_model(age, gender, tenure, contract, calls, data, charges, churn):
    # Create a customer object
    customer = {
        "age": age,
        "gender": gender,
        "tenure": tenure,
        "contract": contract,
        "calls": calls,
        "data": data,
        "charges": charges,
    }

    # Train the machine learning model
    model = train_customer_churn_model()

    # Predict the customer's churn status
    predicted_churn = model.predict(customer)

    # Assert that the predicted churn status is equal to the actual churn status
    assert predicted_churn == churn
4 Likes