Any tips, courses, things we can go away and look at for performance testing within a data world / data team

leighrathbone · 16 August 2023 08:57

A lot of courses for performance testing are aimed at front end / back end ‘systems’ and its hard to find some materials around performance testing within a data team, from ingestion of data from sources, through distillation, processing, and then insights / consumption and what tools to use. Any suggestions would be hugely appreciated

daluu · 18 August 2023 06:11

I don’t know about blogs or courses regarding performance testings around the areas you mention, that’s probably more of a specialized area/field of testing than the norm, thus there is not much public coverage of them.

These areas are also specialized, so the tooling needed to do the tests often may be custom and in-house developed, less open source and off the shelf for general public to use. Or in the case of not, some of the tooling may be general and public, open source, etc. but the other components/tools of the testing are customized specific to the organization/team doing the testing.

For the generic tools that could reapply for the public, it would be things like kafka, redis, elastic search based load/stress test tools for example. For kafka, there are existing tools like kafkameter for JMeter (and variants of that). There was also a kafka integration tool for Gatling too I think. And if you web search around, you’ll find some articles and Youtube videos around kafkameter. Using that example, you can code/customize the kafkameter testing around what your organization would do with kafka, though that’s only one part of the pipeline, so then you need the glue code/tools to pair it up with the testing of the rest of the pipeline.

We could discuss this further if you provide specific technology/vendor examples of data sources/systems, data processing tools, and how the data is consumed or insights derived. Based on the tools used, one can then research what test tools and strategies around those exist.

Another good source of material for this area, is to actually research and follow the articles, forums / discussion groups, mailing lists, slack channels, etc. of the project or groups relating to the data source systems or tools. There might already be discussions in those or you can always initiate a discussion there to find out more. In some rare situations, you might find yourself being the first person to bring up the topic! (like spearheading the effort)

daluu · 18 August 2023 06:24

Also, performance testing the front end, or parts of back end like APIs often indirectly test the data ingestion & processing because the functionality goes from front end (or beginning of back end) trickling down through the data pipeline, before the end result observations route back to the front end, etc.

Thus it is simpler to cover the testing that way. Basically end-to-end style performance testing similar to the equivalent version of (UI/API) functional test automation.

What I previously posted would cover more on the component-ized version of performance testing the system in terms of integration or unit test approach.

daluu · 18 August 2023 06:27

Validating parts of the data system for performance test is similar to the norm of performance testing but you would focus on monitoring tools and metrics relating to the data pipeline rather than end to end system or user/API level metrics. For what to check for whether things are performing properly or not. Proper output, response times, and CPU/memory usage of parts of the data pipeline rather than the front end API server or web server, etc.

Topic		Replies	Views
Performance test material 🙋 Questions tools , learning , performance-testing	4	623	12 August 2023
Best resources for engineers new to performance testing? 🙋 Questions learning , performance-testing	7	372	25 January 2024
Performance: which testing tool and why? 🗄️ Archive performance-testing	33	13708	9 January 2021
Performance: A K8s-based performance testing tool 🗄️ Archive tools	0	301	27 February 2021
Introduce a performance testing tool & service 🙋 Questions tools , performance-testing	4	230	30 January 2025

Any tips, courses, things we can go away and look at for performance testing within a data world / data team

Related topics