What testing metrics can we use for determining the quality of an AI application
I would imagine you would need to clarify what the reward function is of your AI app would influence what metrics you would require.
Beyond that, I would say that as much data as possible on the state of your AI is useful as well. Knowing what state your AI and the decision it makes is going to plug into the previous comment of knowing what goals you have stated for your AI.
Coming up with metrics for determining the quality of an AI application is always a challenge. This is because the working of AI is a black box – we do not control or understand how the algorithm forms different relationships and makes decisions – we just provide different training datasets and monitor the learning/progress.
So, if at all there are some metrics it should probably revolved around-
The quality of the training dataset which is crucial for training AI models. The better the variation and mix in datasets the better the AI models learn and its learning can be generalized to a larger audience.
How well it performs against the validation dataset that is used to evaluate the model learning. Maybe have a threshold number which you have got from historical training of similar kind of AI models and use that to be the measuring factor to determine the quality of learning of AI models. Do the same thing with the test dataset where you expose the model to some unseen datasets that the model has never seen before.
I wrote about some of these aspects here - http://go.testim.io/ai-based-testing-the-future-of-test-automation