The front line for dealing with prod outages at our company are in customer support, and they are the most knowledgeable in the whole company about logging, monitoring and observability. And, the people on the R&D side join in to help, it’s collaborative. They don’t really use runbooks, they rely on their own exploratory and problem-solving skills. They don’t have “game days” to practice. The top priority for R&D (with help from the ops people in support) is to improve logging, monitoring and start having observability so they can quickly diagnose customer problems. I’m excited to see proofs of concepts and different initiatives using new industry standards like OpenTelemetry and OpenTracing.