Are Quality Engineers embedded into the data engineering team or do they sit outside as a separate team?

@jprescott presented a fantastic talk on how to thrive in Quality + Data Engineering at TestBash Autumn and had a super insightful Q&A afterwards.

One of the questions we unfortunately didn’t get to ask live was from @faisal251 :

Are quality engineers embedded into the data engineering team or do they sit outside as a separate team?

And you, what do you think?


A lot of data scientists will say that you cannot test a ML Model but in fact you can.
I’ve gotten on some projects where I used a Metamorphic testing technique to test a machine learning model. (also simple probability theory testing)

How you may ask? Because there is no expected result??

Very good question! But there is…

Let’s say you are building a model to predict X. In a proper project this will not be your only acceptance criteria, what should also be acceptance criteria? The success rate. We are building a model to predict X with a 80% success rate. There you go, an expected outcome (just a bit different).

How do we approach this from QA?

Well a data scientist often gets training material (on which the model is trained) they split it up in multiple parts, 1 to train it on. 1 to verify it on for example.

This is what we want to do also. As a QA engineer, and this is not going to be simple, you’ll have to make 10.000-100.000 cases to validate this.

Let’s say you are testing the value of a house. (in reality there are ~50+ parameters but in the following example I’ll use 1)

House A: build in 2020 is worth 500K
House B: build in 2015 is worth 350K
House C: build in 2010 is worth 250K

You won’t know the exact number it’s going to split out making validating it so much harder BUT you know the worth and can compare it.

Therefor House A > House B > House C

This is pretty easy. But what if you have 50 parameters and houses of the same year and maybe super similar to each other.

You’ll want the model to give you a list of values (estimated guesses of the worth of the house) and you as the QA Engineer will compare those values towards your test cases and if the order if correct for over 80% then you’ve achieved your acceptance criteria. So basically 'You’ll create an order of house values in your test-set and compare it to the order, the model is giving. ’

Do not underestimate the work of making testcases for this, again the above example is explained super simple, in reality there are 50+ parameters and making a test set takes some time and analysis.

DISCLAIMER: Every model is different, so for every model you’ll have to use a different approach. Not every technique will work for each model, start thinking outside of the box :wink:

Now don’t let data scientist say “you can’t test it”
There is a way! :slight_smile:

1 Like

I would still recommend the use of to decide when a skill is in the team, requested by the team, or used more as a service. Any talk from Susanne Kaiser, eg explains how to design organizations for better flow. It’s not only a testing problem…

1 Like

I know at my place of work they are embedded into the teams, Quality Engineers, that is. Although it is still relatively new to my company so we are trying to figure out the best balance and how to effectively utilize a Quality Engineer.

We have them embedded so they can be closer with the teams and the code base and really advocate for quality from within a team structure instead of being an outside force. This way we can help teams shift left and try to find bugs or defects earlier in the process. At least that’s the initial thought we have. There is talk of a potential shake up where QEs will no longer be embedded and act as a shared service, but we shall see what comes of that.