Has anyone used synthetic monitoring for production smoke tests?

parveen · 23 May 2025 10:19

Hi all

I’m looking into ways to validate critical production endpoints safely, especially when we can’t create or modify data due to downstream or financial impact.

Has anyone used synthetic monitoring as a way to run post-deployment smoke tests in production?

I’d love to hear:

What tools you’ve used (e.g., Grafana, Prometheus, Datadog, etc.)
What types of tests or flows you cover
How you manage safe, synthetic data
Any lessons learned or anything to avoid

Any examples or insights would be really helpful! Thanks in advance

juanalvarezarquillos · 25 May 2025 13:39

Very valid point!
We use DD synthetic monitors (€€€) to check certain endpoints that are particularly important for the company. The goal is to ensure the service status in production. Personally, I am against this approach because we deploy our services on a managed cluster, allowing us to test these “important” endpoints from within the cluster, thereby reducing costs. Ensuring connectivity from outside the cluster should be the responsibility of the cloud provider, as defined by their SLAs. So what is the point of Synthetic monitors in production in a managed environment?

komalgc · 18 June 2025 18:37

I have explored synthetic monitoring in production using combination of Playwright, Prometheus, and Grafana…

Playwright: For browser-based automation to simulate user journeys.
Kubernetes CronJobs: To schedule and run Playwright tests at regular intervals.
Prometheus: For collecting custom metrics like flow success/failure, response time, and errors.
Grafana: For visualizing the synthetic test trends and alerting on failures.

Few things to consider :

Avoid flaky tests: Keep the synthetic flows short, stable, and fast.
Use tagging and labels in Grafana to distinguish synthetic failures from backend spikes.

lisacrispin · 18 June 2025 18:56

I’m late to the party here and I don’t have specific tool suggestions. One company where I worked, where our product was SaaS, we used a service where we ran a simple UI smoke test script at regular intervals during the day - no updating, just navigating around - so that the requests came from different parts of the world. Must have used a VPN service? Anyway - it was quite a surprise to know that our website could be inaccessible from one part of the world when it was working fine for us. And as our customers were global, it was critical that it was always available.

stuthomas · 19 June 2025 10:12

I have used newrelic for this. We had a couple of things running regularly, but that one that stands out was a UI test (using playwright) we used for testing the log in and log out functionality. Proved a useful early indicator of issues more than once.

Where I am currently we have a combination of a postman collection run on a schedule and some in datadog. These all monitor apis.

Topic		Replies	Views
How do you keep testing grounded when you don’t have realistic production data? Discussions automation , process	6	77	7 April 2026
Testing in Multi-Regional Environments Archive	0	535	4 December 2018
Postman for API Monitoring? Archive api-testing , monitoring	2	720	1 July 2020
How do you consider testing when you have no access to production data? Discussions tools , learning , process	3	111	13 August 2025
30 Days of Testability Day 5: What monitoring system is used for your application 30 Days of Testing 30-days-of-testing , testability , monitoring	5	1048	8 March 2019

Has anyone used synthetic monitoring for production smoke tests?

Related topics