What white papers are you reading about LLMs?

simon_tomes · 25 February 2025 17:35

During a This Week in Testing, @al8xr mentioned the following whitepaper:

Mutation-Guided LLM-based Test Generation at Meta

I hadn’t seen a whitepaper for a while and I wondered, are folks diving into the details of advances in Large Language Model (LLM) tech via whitepapers? If so, where do you get your LLM related whitepapers from?

And feel free to comment on this particular whitepaper if you like. Have you seen it before reading this post? What did you make of it?

ujjwal.singh · 25 February 2025 20:27

I don’t read white papers much, but as I work in Fin-Tech I was exploring the use cases of LLM in finance and recently came across the below white paper, which focuses on the same:

I usually find such white papers in blogs and articles on different platforms.

al8xr · 26 February 2025 11:07

Why

Reading whitepapers is hard, but beneficial.

At some point in your career, you realize that “there are no new or interesting topics” in testing or automation. (Especially when you have not transitioned to the leadership role). Here comes the whitepapers.

Whitepapers can help you by providing up-to-date research on the given topic. Personally, I am reading and collecting whitepapers on blockchain testing, but I am also reading on LLMs and testing from time to time.

Where to start

General guidelines on how to read papers - here.
Or a good video on the topic.

A few advices

How to search for an interesting whitepaper: find one, go to references, and find even more papers on the topic!
To search for a free whitepaper - use Google Foo and add ‘filetype:pdf’ to your search phrase
Make a list of papers and read it one by one
Some papers are more “general” one - so you can skip part of it
Some papers contain math - so you might spend some time getting through it
Make notes from papers! (There are a bunch of note-taking techniques)

Papers on LLMs

Papers that I added to my ToDo (after reading the paper I mentioned on TWiT session):

Testing Web-Enabled Simulation at Scale Using Metamorphic Testing
Automated Unit Test Improvement using Large Language Models at Meta
Software Testing Research Challenges: An Industrial Perspective
Assured LLM-Based Software Engineering
Observation-based unit test generation at Meta
The Oracle Problem in Software Testing: A Survey
What It Would Take to Use Mutation Testing in Industry—A Study at Facebook
ChatUniTest: A Framework for LLM-Based Test Generation
Mutation-guided LLM-based Test Generation at Meta

I plan to start making reviews on papers in my blog. Because I am collecting a lot of information and not sharing it with the world. This thing should be fixed

al8xr · 26 February 2025 11:14

Just forgot to mention.

For those of us who prefer listening to podcasts - you can use Notebook.LM service from Google and turn any whitepaper into a … podcast.

There are some limitations, though - it can’t work with graphs, diagrams, and other images.

But beware of relying solely on such an AI output. It may contain errors (like any software).

komalgc · 26 February 2025 12:23

I m currently reading papers from arxiv, n have found it useful

aleroux · 26 February 2025 15:14

I haven’t, yet, started reading on LLM-Based Test Generation. I would appreciate any feedback on the matter (articles, whitepapers…) has it’s on my top of things to lookout this year.
About META’s Mutation-Guided LLM-Based Test Generation paper, it’s a hard and long read. I’ve found an easier article that gives the high level view:

simon_tomes · 12 March 2025 12:04

The Google Deepmind Gemma Team released this on 12/03/2025:

Gemma 3 Technical Report

simon_tomes · 28 March 2025 10:52

Circuit Tracing: Revealing Computational Graphs in Language Models extends last year’s interpretable features into attribution graphs, which can “trace the chain of intermediate steps that a model uses to transform a specific input prompt into an output response”.
On the Biology of a Large Language Model uses that methodology to investigate Claude 3.5 Haiku in a bunch of different ways. Multilingual Circuits for example shows that the same prompt in three different languages uses similar circuits for each one, hinting at an intriguing level of generalization.

Source: Simon Willison’s Weblog.

dstekanov · 29 March 2025 22:45

Thank you for your advice, Oleksandr! Especially regarding the list of papers on LLM

simon_tomes · 2 May 2025 10:10

Generative AI Act II: Test Time Scaling Drives Cognition Engineering via Cornell University.

Topic		Replies	Views
Does anyone use openly available LLM's to support testing? Discussions tools , ai	3	177	16 September 2024
How are you using LLM or AI in testing? Discussions tools , learning , automation , ai	0	89	26 March 2025
Have you tried any of these LLM-as-a-Judge tools? Discussions tools , risks , llms , llm-as-a-judge , evaluation	2	188	24 June 2025
Are there any specific test automation tools or frameworks recommended for testing Language Models - LLM? Discussions automation , ai	4	2082	20 January 2024
Local vs Cloud LLMs in QA — where do you stand? Discussions tools , career-development , ai	4	161	6 October 2025

What white papers are you reading about LLMs?

Why

Where to start

A few advices

Papers on LLMs

Related topics