30 Days of DevOps Day 8: Production Alerts

heather_reid · 8 May 2020 11:54

On Day 8 of 30 Days of DevOps we’re asked to:

Find out how your team/company are alerted for production problems, and how they respond. Do they have a runbook or have “game days” to practice responding to prod failures?

What did you learn about your alert process?

lisacrispin · 9 May 2020 14:53

The front line for dealing with prod outages at our company are in customer support, and they are the most knowledgeable in the whole company about logging, monitoring and observability. And, the people on the R&D side join in to help, it’s collaborative. They don’t really use runbooks, they rely on their own exploratory and problem-solving skills. They don’t have “game days” to practice. The top priority for R&D (with help from the ops people in support) is to improve logging, monitoring and start having observability so they can quickly diagnose customer problems. I’m excited to see proofs of concepts and different initiatives using new industry standards like OpenTelemetry and OpenTracing.

heather_reid · 13 May 2020 13:23

From Twitter we have

Topic		Replies	Views
How Do You Monitor? Archive monitoring	5	1153	18 September 2018
30 Days of Testability Day 5: What monitoring system is used for your application 30 Days of Testing 30-days-of-testing , testability , monitoring	5	1026	8 March 2019
30 Days of DevOps Day 10: Tools 30 Days of Testing devops , 30-days-of-testing	3	549	13 May 2020
Power Hour - Curious, Stuck or Need Guidance on DevOps or Observability? Archive devops , power-hour	33	3986	24 December 2020
TestBash SF 2019 Live Blog: Observability Archive testbash-san-francis	0	699	7 November 2019

30 Days of DevOps Day 8: Production Alerts

Related topics