Does anyone use openly available LLM's to support testing?

objectivetester · 8 September 2024 15:35

Hi all,

I’ve been looking at the capabilities of both commercial (e.g. GPT4, Gemini and Claude Sonnet), and openly available models (Llama3.1, Gemma2, Qwen2 and Phi3) and have made a blog post on how to get started using the openly available models:

https://allthingstesting.com/running-ai-models-locally/

Has anyone been using these openly available models and what supporting platform have you been using?

parwalrahul · 8 September 2024 20:13

I have used llama / codellama.

Responses were slow. I needed a better system for faster responses

ujjwal.singh · 9 September 2024 16:58

I use phind and it was helpful up to some extent. It was better in terms of results at least for automation testing.
For text-based content related to QA Documentation, the GPT4 result was better.

objectivetester · 16 September 2024 20:17

I’m seeing some great results with openly available large language models running locally using prompts to generate tests and test data

Full results here: https://allthingstesting.com/local-ai-models-to-support-testing/

Summary:

Gemma2 9b (Q4)

Refused to generate the NI numbers

The test cases cover almost all of the valid state transitions (6/7) and the expected result. Useful negative tests aren’t listed

Llama 3.1 8b (Q4)

Very good at data generation

The test cases cover all 7 of the valid state transitions and include negative cases, and list initial state, action and result. As a bonus, some synthetic user test data has been included

Mistral-Nemo 12b (Q4)

Good sample and an explanation of the output

The test cases (although not in CSV) cover most of the valid state transitions (5/7) and the tests detail initial state, action, input, expected result and final state, but no negative tests. As a bonus, some synthetic user test data has been included

Phi3 3b (Q4)

Needs repeated re-prompting until it produces usable test data output

Needed repeated re-prompting, the test cases cover most of the valid state transitions (5/7) and the tests detail initial state, action, input, expected result and final state. As a bonus, some synthetic user test data has been included

Qwen2 7b (Q4)

The data looks good, but isn’t in CSV format

The test cases cover most of the valid state transitions (5/7) and the tests detail state, transition, input, and expected outcome, but no useful negative tests

Topic		Replies	Views
Local vs Cloud LLMs in QA — where do you stand? Discussions tools , career-development , ai	4	178	6 October 2025
🤖 Day 28: Build your own AI Tools 30 Days of Testing tools , 30-days-of-testing , ai , 30-days-of-ai-in-testing	23	1285	1 July 2024
As a tester, which LLM do you use most frequently? Discussions ai , 30-days-of-ai-in-testing , agentic-ai	1	242	19 June 2025
Have you tried any of these LLM-as-a-Judge tools? Discussions tools , risks , llms , llm-as-a-judge , evaluation	2	205	24 June 2025
How to test genAI Discussions automation , automation-in-testing	5	786	24 February 2024

Does anyone use openly available LLM's to support testing?

Related topics