AI Testing Standard and Metrics

ross · 12 August 2025 14:08

These past few weeks I have been studying AI, use vibe coding tools, attended some conference with lots of how to use AI tools and research info data related to AI. Anyway up to this time I haven’t seen and simplified info on testing AI, strategy being used, test designs and metrics defining if confidence is enough to release the AI tool to production. If you guys have reference to it please do send the link im interested to give inputs and learn from it.

MarkJB · 12 August 2025 15:03

I recently completed the ISTQB AI tester certification - that goes into a fair amount of detail on testing AI and what that entails. The focus is on ML but some of it is applicable to LLMs.

They talk about back-to-back testing and A/B testing to compare with either a pseudo oracle (a system created to mirror the expected results of the AI SUT so you can validate test results of an AI system) or using a previous version of an AI system.

They also talk about data analysis (of the training data).

There are a few metrics used to quantify AI systems like Precision, Accuracy, Recall, F1 Score, Receiver Operation Characteristics and the associated Area Under Curve (ROC/AUC) as well as Inter and Intra Cluster metrics. The type used depends on the type of model/algorithm (Classification/Regression (Supervised Learning), Clustering/Association (Unsupervised Learning)).

The syllabus is free: Certified Tester AI Testing (CT-AI) - International Software Testing Qualifications Board

ross · 12 August 2025 15:29

thanks Mark I just started reading through it as well.

ramanan49 · 13 August 2025 01:45

@ross,

Exactly! AI testing is still an unwritten domain for any clear-said, worldwide guidelines. What we have today is scattered in research papers, case studies, and company-internal practices.

From my perspective, the testing of AI revolves around:

Data quality metrics—bias detection, representativeness, completeness

Model performance metrics—accuracy, precision, recall, F1 score

Robustness testing—adversarial inputs, edge cases

Ethical and fairness checks

Explainability—how transparent and understandable the decision-making process is

Two frameworks worth mentioning:

NIST AI Risk Management Framework

NIST Risk Management Framework was set forth by the U.S. National Institute of Standards and Technology and presents a structured approach to identifying, assessing, and managing AI risks such as safety, biases, and trustworthiness.

ISO/IEC 24028

For measuring AI trustworthiness in terms of security, privacy, reliability, and ethical considerations, the international standard ISO/IEC 24028 was laid down.

Topic		Replies	Views
AI Testing Call For Papers Archive	0	660	8 November 2018
What Are the Best Practices for Testing AI Models and Systems? Discussions ai , testing	3	456	23 June 2025
What heuristics can we create for AI in Testing? Discussions test-strategies , risks , heuristics , ai , test-cases	3	123	23 September 2024
🤖 Day 12: Evaluate whether you trust AI to support testing and share your thoughts 30 Days of Testing 30-days-of-testing , critical-thinking , ai , 30-days-of-ai-in-testing	33	1153	20 March 2024
Are we testing AI, or is AI testing us? Discussions tools , learning , automation	2	88	7 August 2025

AI Testing Standard and Metrics

Related topics