Learn design and implementation of backup and restore features provided by TiDB Operator. Today, I’ll move on to a new but important topic: backup and restore.
Join the DZone community and get the full member experience. In our last article, we learned how to implement a component control loop in TiDB Operator. This time, I’ll move on to a new but important topic: backup and restore. Backup and restore are two of the most important and frequently used operations when you maintain a database. To ensure data safety, database maintainers usually need a set of scripts that automatically back up the data and recover the dataset when data is corrupted. A well-designed backup and restore platform should allow you to: TiDB Operator provides CustomResourceDefinitions (CRDs) for all these requirements. In this post, I’ll walk you through the core design logic of the TiDB Operator’s backup and restore features, leaving out the trivial implementation details. Let’s get started. TiDB Operator performs ad-hoc backup, restore, and scheduled backup via custom resources (CRs) such as Backup, Restore, and BackupSchedule, so we implement three corresponding controllers to execute the control loops. When a user needs to start a backup job, they can create a YAML file as follows and submit it to Kubernetes. For example: When the backup controller receives an event that creates the Backup resource, it creates a job to do the configured backup operation. In the case above, it backs up data in the mycluster database in the test1 namespace and stores the data in the GCP storage specified in the gcs field. In the following sections, I’ll explain the internal logic of the three controllers. The backup controller manages the Backup CR. Based on the configuration in the spec field, the controller uses BR or Dumpling to perform the backup task and deletes the corresponding backup file when the user deletes the Backup CR. Similar to that of other controllers, the core of the backup controller is a control loop, which listens to the Backup CR events (create, update, and delete) and runs the required operations. In this section, I’ll skip the generic control loop logic and focus on the core backup logic. The core logic of the backup controller is implemented in the syncBackupJob function in the pkg/backup/backup/backup_manager.go file. The actual code processes many corner cases; to facilitate understanding, we removed the unimportant details, so you may see some function signatures are inconsistent. The core logic code is as follows: In the code block above, backup is the Go struct converted from the Backup YAML was created by the user. We use a ValidateBackup function to check the validity of the fields in backup. Because the backup task is executed as a Kubernetes-native job and because the controller must ensure idempotency—duplicated executions don’t affect the end result—it is possible that a job already exists. Therefore, we try to find if there is an existing backup job in the same namespace: Next, the controller decides whether to use BR or Dumpling to perform the backup task and executes the corresponding function to create the Job spec. In this step, if you have configured the br field, the controller chooses BR; otherwise, it goes with Dumpling.