Data Warehouses and Automation: Success stories (if any!)

I’m currently working on ETL Testing. You know the deal: Data goes from A to B with some logic in the middle and voilá, testing needs to be done.

So far, the solutions I’ve seen to automate this have been in-house built frameworks. Sometimes with Python, sometimes with Java…

As someone who has been automating APIs and UI for more than 10 years, I find it curious the lack of standards and frameworks there are out there to take care of this automation area.

So…for all of you testers in ETL doing automation: What have been your challenges, failures and successes attempting automation in this not-so-explored area.

Cheers!

2 Likes

Ha ha, I noticed no one answered.

Yeah it’s a toughy… basically as you described.

Its been a couple of years since I did a project like this and can’t give an definitive answer but can provide some strategies that might help.

  1. Tests are usually examples. Given data X I expect answer Y. So first strategy is to start with known (synthetic data) that covers the various test partitions and and boundaries. Hopefully these are clear from the acceptance criteria before the Dev was done! :slight_smile: so you just populate the data and check the result.

  2. In theory you covered every possible scenario with synthetic data above. Customers and the real world love to proof you wrong. So now take some actual data from production and watch your etl jobs crash and burn. Try popular reports and ensure there are no obvious errors. There the ones customers get cross about.

  3. If ther is an existing or legacy report, compare some reports with that. Customers hate changes in numbers even if it’s an improvement or correction.

  4. Having said tests are are examples, etl can be tricky to have known outcomes for. Some times you can write some code to validate the repot results. This works because some times your code is higher level, can focus on individual calculations and ignore edge cases so is simpler. Also it doesn’t need to scale or be distributed. But sometimes it’s just a second opinion. So ok the reports don’t match but 50% you are right and 50% Dev was right so you investigate and fix the wrong one.

2 Likes

Ah I didn’t really read the question In full…

Regards frameworks. Not sure it matters. It totally depends on the technical architecture. I have had success with big oracle databases, pentaho, Aws shift and snowflake. It’s not much different to testing any other service or application.

1 Like

Thanks for replying! Regarding the frameworks, I think that the lighter the better in this case, as it shouldn’t be needed much complexity to test ETL (in the end, it would be a glorified SQL executor in my case, which needs to checks that a count is X or Y).

I don’t think a lot of people do Data Warehouse testing and especially automate it. :frowning:

A friend of mine did it and all I know is that he build his own framework to test it.

So true, what vendor would make a tool for this, since people would only need it once every hopefully 25-30 years :stuck_out_tongue: