How Do You Monitor?

chris.adams · 18 September 2018 20:07

Hi,

When I first started in my current test role, the responsibility to manage deployments on two non-prod and, in time, one prod environment was passed onto me. Whilst I was primarily testing one api, I had to check that several other web apps, apps, APIs, databases etc. were all available and coordinate deployments across several teams globally. This was incredibly time-consuming and a pain to work out which component had gone down when there were issues.

With that, I built a healthcheck monitor. The backend was Java with a HTML and Javascript frontend. This served a purpose but I hadn’t factored in any way to quickly reconfigure the components to monitor.

During the last month or so, I’ve been building a new monitor. This one has a Node backend and a React front-end.

I recently posted about it here.

It’s not a sophisticated tool, it’s not meant to be. But if your patch is a distributed system and you want a simple dashboard to see what is running and what isn’t, you might find it useful.

Regarding logging. Effective application logging, by effective I mean meaningful and not overly verbose, is essential in investigating why an exception/issue occurred. Quite often, I find logs to be very noisy (as noted elsewhere in this thread). I guess there is a tendency to log absolutely everything to combat the fear of not logging something that might be useful for reason (unknown unknowns!).

We have also recently included decision logging too. This isn’t to do with errors or exceptions, but instead focuses on why the system made a particular decision. By that I mean a separate log that explains why the system selected a product at a particular warehouse. E.g shorter shipping time, combines items from the same order into a single package.

We tend not to worry about CPU or memory as a) the servers are managed by other groups and b) we wouldn’t be granted such low-level access.

Topic		Replies	Views
What are some indicators/metrics that you have had a successful release? 🗄️ Archive quality , metrics	5	1238	20 October 2017
30 Days of DevOps Day 8: Production Alerts 📆 30 Days of Testing devops , 30-days-of-testing	2	384	13 May 2020
How to measure and visualise the quality of releases 🗄️ Archive quality	2	456	8 February 2019
Work item management and workflow 🗄️ Archive process	8	334	8 January 2021
What issue tracking tool do you use? 🗄️ Archive tools	27	1988	20 May 2020

How Do You Monitor?

Related Topics