Ideas for testing in an ML and devops context


A team in my company is trying to fit testing into a project around ML deliveries and DevOps. I want to get them in a more up to date direction.
First of all, that test happens everywhere:

Secondly something more related to Modern Testing Principles by Melissa Eaden, re: the role of the Data analyst: Inspired by Alan Page’s TestBash Brighton talk and his AMA on Modern Testing, Melissa gave us her interpretation of Alan Page and Brent Jensen’s Modern Testing Principles. In this article, there is also a handy graphic of the principles for you to download, print off and stick up.

They have shown me this drawing:

Perhaps something on testing ML applications too? What can you recommend?

Hmm, interesting question. We have tons of great content, both in the form of guest blog posts and links to resources, on, but not specifically about ML. I’ll think about this and look around!

1 Like

Oh, that diagram is super helpful!

1 Like

Just found this: Has a nice section on testing & quality.

1 Like

I got some information from one of our engineers who has done a lot of the ML areas of our product. Since we don’t have separate teams for ML engineering or data science, we don’t have the handoffs as depicted in that model you posted. We do use Jupyter notebooks for some of the initial model design and other data analysis. We’ve used GCP’s ML Engine services to train and serve Tensorflow models in prod in the past.

Our product is different in that the ML is trained on customer data for each individual customer’s automated regression tests. As opposed to training on data before releasing a trained version.

1 Like

And here’s some more from my teammates. The team does a lot of testing and validation, including stability testing, of the modeling approach & algorithm on data including our own prod data (since we dogfood the product) before releasing it to prod. Our models aren’t black boxes, we are able to make them fairly transparent to users, so they know what to expect.

For example, our visual checking feature shows the areas of a web page that won’t be checked for visual changes, and shows thresholds for determining performance anomalies. Users can get a feel for how the boundaries change over time, and they know what to expect. If they see something unexpected, we get feedback from them.

I feel a bit silly that I’m the only one replying to this, but this question really got me thinking!

1 Like