Home United States USA — software Develop a Daily Reporting System for Chaos Mesh To Improve System Resilience

Develop a Daily Reporting System for Chaos Mesh To Improve System Resilience

May 29, 2022

101

This post describes how to develop a daily reporting system to automatically collect logs and generate reports to document your chaos experiments.
Join the DZone community and get the full member experience. Chaos Mesh is a cloud-native chaos engineering platform that orchestrates chaos experiments on Kubernetes environments. It allows you to test the resilience of your system by simulating problems such as network faults, file system faults, and Pod faults. After each chaos experiment, you can review the testing results by checking the logs. But this approach is neither direct nor efficient. Therefore, I decided to develop a daily reporting system that would automatically analyze logs and generate reports. This way, it’s easy to examine the logs and identify the issues. In this article, I will introduce how chaos engineering helps us improve our system resilience and why we need a daily reporting system to complement Chaos Mesh. I’ll also give you some insights about how to build a daily reporting system, as well as the problems I encountered during the process and how I fixed them. Chaos Mesh is a chaos engineering platform that orchestrates faults in Kubernetes. With Chaos Mesh, we can conveniently simulate extreme cases in our business and test whether our system remains intact. At my company, we combine Chaos Mesh with our DevOps platform to provide a one-click CI/CD process. Every time a developer submits a piece of code, it triggers the CI/CD process. In this process, the system builds the code and performs unit tests and a SonarQube quality check. It then packages the image and releases it to Kubernetes. At the end of the day, our daily reporting system pulls the latest images of each project and performs chaos engineering on them. The simulation doesn’t require any application code change; Chaos Mesh takes care of the hard work. It injects all kinds of physical node failures into the system, such as network latency, network loss, and network duplication. It also injects Kubernetes failures, such as Pod or container faults. These faults may reveal vulnerabilities in our application code or the system architecture. When the loopholes surface, we can fix them before they can do real damage in production. Spotting these vulnerabilities isn’t easy; however, the logs must be carefully read and analyzed. This can be a difficult job for both the application developer and the Kubernetes specialist. The developer may not work well with Kubernetes; a Kubernetes specialist, on the other hand, may not understand the application logic. This is where the Chaos Mesh daily reporting system comes in. After the daily chaos experiments, the reporting system collects logs, draws a plot, and provides a web UI for analyzing the possible loopholes in the system. In the following sections, I’ll explain how to run Chaos Mesh on Kubernetes, how to generate daily reports, and how to build a web application for daily reporting.