Liveblogging TB Brighton #8: Rise of the Guardians - Testing Machine Learning Algorithms 101 by Patrick Prill

Rise of the Guardians - Testing Machine Learning Algorithms 101

By Patrick Prill @testpappy

Our lovely neighbourhood Grumpy Tester takes the stage all the way from Germany. Heā€™s going to talk to us about Machine Learning. He got involved in this topic as his employer, QualityMinds, is participating in a government financed research project for autonomous driver. As this is a government-funded project, it has not actually started yet, but Patrick wanted to ramp up his knowledge already.

Patrikc starts with some Machine Learning Basics:

ā€˜a method for finding patterns in data that are usefully predictive of future events but which do not necessarily provide an explanatory theory.

There are three types of Machine Learning:

Supervised learning

Unsupervised learning

Reinforcement Learning Ć  learning through rewards and punishments.

Patrick is diving straight into different types and possibilities and shows a bunch of graphs and tables and things. The Basics of Machine Learning: neural networks perceptron, tensor(fields) (aka multidimensional arrays in certain context).

Machine Learning 101:

You put in a lot of data/perceptron/tensors/etc.

{Magic Happens Here}

A probability factor roles out: for in machine learning things are not black and white. It gives you back a probability that something is the answer.

Patrick now leads us further into the ā€˜Magic Happens Hereā€™. I am not going to attempt to write up all the concepts/mathematical concepts he is discussing: do watch this talk!

The basic principle (as I currently understand it) for Image Recognition:

A classic network: the image is folded over and over again, and divided up into (tiny) squares, and it goes through a filtering, after which it is folded again. This leads to an indication of what the likely digits in each tiny square is.

Patrick tries to explain it using a persona of a scientist:

By folding (convoluting) the input layer, it recognizes reoccurring elements of an image. It sums up the small parts. Now, it takes a guess which digit it most probably is.

Classical ML Testing approach: use test data the network hasnā€™t seen before: calculate the correctness. Happy? Release it! Not happy: tweak it further!

Patrick sighs: ā€˜if this is all that testing Machine Learning isā€¦. We are not needed as testers. Soā€¦ letā€™s look a little deeperā€

Next up: a challenge for the audience! Patrick shows us a pixelized image. What number is shown here? What was the computerā€™s answer? What was the label given? What was the probability based on our machine learning?

The first few digits are fairly easy, but quickly Patrick shows us some pixels that are a bit contentious: what is it? He next goes into detail how the model decided on the probability and chooses based on that. Understanding how the algoritm chooses is important as it allows people to manipulate or fool the algoritm. At the end of the talk, Patrick shows how a panda can be considered a gibbon with a 93% probability by adding a few pixels hardly detectable to the human eye.

Considerations regarding diversity and biases are essential! Patrick learned to write his ā€˜1ā€™ in school not as a straight line, but with a line underneath and a roof on top. Turns out: the dataset for the model he is using is the American census and American high school studentsā€¦

Overfitting: your model is aligned too much to the training data and will most probably have problems with production data

Underfitting: when the model does not fit the data

Ethics are huge considerations as well: think carefully about the further context in which an algorithm is used, such as autonomous driving, medical decision, jurisdiction, financial, etc.

And what happens if someone wants to fool the algorithm? (Like the panda above or more nefarious examples.)

There are challenges for testers :

Mass data

There is no clear ā€˜test passedā€™ ā€“ what is good enough?

Systems are non-deterministics

Focus on data or statistics

Long ā€˜development/testā€™ cycles

Fortunately, Patrick shows us to a number of resources so we can do our own research and expand our knowledge on testing machine learning!