Home United States USA — software How To Write Meaningful Retrospectives

How To Write Meaningful Retrospectives

170
0
SHARE

The incident retrospective is a key component of incident management in SRE practice. Let’s learn the 7 main elements to produce more meaningful retrospectives.
Join the DZone community and get the full member experience. One of the foundations of incident management in SRE practice is the incident retrospective. It documents all the learnings from an incident and serves as a checklist for follow-up actions. If we step back, there are 7 main elements to a retrospective. When done right, these elements help you better understand an incident, what it reveals about the system as a whole, and how to build lasting solutions. In this article, we’ll break down how to elevate these 7 elements to produce more meaningful retrospectives. Incident retrospectives can be the core of your communication with customers and other stakeholders, post-incident. We talk a lot about how retrospectives function best when they involve input and feedback from all relevant stakeholders. That doesn’t necessarily mean squeezing tons of folks into one meeting or sending out one long pdf to a large group without thoughtful considerations. The best example of this is distinguishing between customer stakeholders and internal team stakeholders. Customers should be kept in the loop and assured that a resolution is imminent or has already come, but they probably don’t need to know (or shouldn’t know) the minutiae. Communicating retrospectives to stakeholders requires empathizing with how they use your services. Describe the incident in the context of what matters most. But don’t beat around the bush, either — you don’t want to come across like you’re hiding or downplaying the impact. Simple, factual statements such as “if you use service x to do y, you lost that ability for 12 hours” is enough to convey your understanding. Once you’ve established the impact, start to regain trust. Reassure stakeholders about relevant things that didn’t go wrong. In the aftermath of an incident, stakeholders could be worried that there are other problems that weren’t reported. Explicitly state that there wasn’t any data lost, or private information made public, or any other relevant concerns. Share your action plans with stakeholders too. They may not have the context to understand the details of your solution, but you can explain the impact your plan will have. Be direct to convey your confidence. Again, simple statements work great: “the outage was caused by insufficient server bandwidth. A new process will automatically expand bandwidth in response to increased load. This will alleviate an incident like this in the future.” This is the language of scientific research, which removes personal pronouns from the prose. It’s a great way to keep statements simple, avoid finger-pointing, and remain factual and ideally data-driven. By expanding your message to stakeholders in this way, they’ll understand that their pain has been understood, and addressed systematically and enduringly. In more technical retrospectives, generally for study by internal development teams, it’s useful to include any monitoring data your system captures at the time of the incident.

Continue reading...