Does anyone use openly available LLM's to support testing?

Hi all,

I’ve been looking at the capabilities of both commercial (e.g. GPT4, Gemini and Claude Sonnet), and openly available models (Llama3.1, Gemma2, Qwen2 and Phi3) and have made a blog post on how to get started using the openly available models:

https://allthingstesting.com/running-ai-models-locally/

Has anyone been using these openly available models and what supporting platform have you been using?

3 Likes

I have used llama / codellama.

Responses were slow. I needed a better system for faster responses

1 Like

I use phind and it was helpful up to some extent. It was better in terms of results at least for automation testing.
For text-based content related to QA Documentation, the GPT4 result was better.

2 Likes

I’m seeing some great results with openly available large language models running locally using prompts to generate tests and test data

Full results here: https://allthingstesting.com/local-ai-models-to-support-testing/

Summary:

Gemma2 9b (Q4)

Refused to generate the NI numbers

The test cases cover almost all of the valid state transitions (6/7) and the expected result. Useful negative tests aren’t listed

Llama 3.1 8b (Q4)

Very good at data generation

The test cases cover all 7 of the valid state transitions and include negative cases, and list initial state, action and result. As a bonus, some synthetic user test data has been included

Mistral-Nemo 12b (Q4)

Good sample and an explanation of the output

The test cases (although not in CSV) cover most of the valid state transitions (5/7) and the tests detail initial state, action, input, expected result and final state, but no negative tests. As a bonus, some synthetic user test data has been included

Phi3 3b (Q4)

Needs repeated re-prompting until it produces usable test data output

Needed repeated re-prompting, the test cases cover most of the valid state transitions (5/7) and the tests detail initial state, action, input, expected result and final state. As a bonus, some synthetic user test data has been included

Qwen2 7b (Q4)

The data looks good, but isn’t in CSV format

The test cases cover most of the valid state transitions (5/7) and the tests detail state, transition, input, and expected outcome, but no useful negative tests

2 Likes