Have you tried any of these LLM-as-a-Judge tools?

simon_tomes · 23 June 2025 08:12

I enjoyed this article:

I like how it’s super clear how you might use one LLM to evaluate another LLM’s work.

It also mentions some tools:

OpenAI Evals: A framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

DeepEval: A simple-to-use LLM evaluation framework for evaluating and testing large-language model systems (e.g., RAG pipelines, chatbots, AI agents, etc.). It is similar to Pytest but specialized for unit testing LLM outputs.

TruLens: Systematically evaluate and track LLM experiments. Core functionality includes Feedback Functions, The RAG Triad, and Honest, Harmless and Helpful Evals.

Promptfoo: A developer-friendly local tool for testing LLM applications. Support testing on prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs.

LangSmith: Evaluation utilities provided by LangChain, a popular framework for building LLM applications. Supports LLM-as-a-judge evaluator for both offline and online evaluation.

Have you used any of them before? In what context? How did they help? What problems/issues did you encounter?

kristof · 23 June 2025 08:13

This one is also pretty fun for making lists and pro’s & cons to “be a consultant”!
Not much of LLM but could be somewhat used for interesting parts.

parwalrahul · 24 June 2025 19:38

Not sure if you are aware, but I wrote the Council of LLMs model.

It’s on the same lines with real ownership and power with the human.

Topic		Replies	Views
🤖 Day 28: Build your own AI Tools 30 Days of Testing tools , 30-days-of-testing , ai , 30-days-of-ai-in-testing	23	1244	1 July 2024
Local vs Cloud LLMs in QA — where do you stand? Discussions tools , career-development , ai	4	153	6 October 2025
Does anyone use openly available LLM's to support testing? Discussions tools , ai	3	177	16 September 2024
Are there any specific test automation tools or frameworks recommended for testing Language Models - LLM? Discussions automation , ai	4	2080	20 January 2024
How to test genAI Discussions automation , automation-in-testing	5	786	24 February 2024

Have you tried any of these LLM-as-a-Judge tools?

Related topics