How is everyone testing LLM based applications?

Heya everyone :waving_hand: Been a while! But great to be back to the club.

AI based apps are everywhere, and companies have incorporated it into their existing offerings in one way or the other, in small and big ways.

Ignoring all the hyped apps, looking into just usecases that are truly a value add, even in the smallest of ways.. testing them have been such a precarious thing.

The aspect that an input drastically affects the output that comes out is now actually a feature and not a bug!

How is everyone looking at testing them ? The obvious ones that everyone have started adopting are Evals.

The non-determinism is just such contrasting that its close to impossible to cover even 80% of the cases.

So my question is… how have you changed your mindset in today’s world to test AI based features, that are non-deterministic, inherently biased and easily manipulated through prompt injections, etc.. ? how has your thought process changed ?

1 Like

One of my takes is that now more than ever, we testers will be increasingly required to evaluate these types of applications.I’ve seen some comments on Internet about people complaining on what AI is generating (talking about apps). So, in theory, testers should be needed more than ever and not replaced by AI.

One of the things that comes to my mind is to do a static analysis of the prompt that’s going to be executed. Does it meet what the stakeholder looks for? But, then we have to do another static analysis for the code generated to avoid security/leak breaches.

1 Like