Домой United States USA — software How to Mutate Data in a System Designed for Immutable Data

How to Mutate Data in a System Designed for Immutable Data

По

June 20, 2020

294

A look at how understand which database system was right for us and how we adapted our approach when we encountered some unexpected challenges.
Let’s be friends:
Comment (0)
Join the DZone community and get the full member experience.
In a post published on our blog earlier this year, we described some of the decision-making that went into the design and architecture of Snuba, the primary storage and query service for Sentry’s event data. This project started out of necessity; months earlier, we discovered that the time and effort required to continuously scale our existing PostgreSQL-based solution for indexing event data was becoming an unsustainable burden.
Sentry’s growth led to increased write and read load on our databases, and, even after countless rounds of query and index optimizations, we felt that our databases were always a hair’s breadth from the next performance tipping point or query planner meltdown. Increased write load also led to increased storage requirements (if you’re doing more writes, you’re going to need more places to put them), and we were running what felt like an inordinate number of servers with a lot of disks for the data they were responsible for storing. We knew that something had to change.
Here’s a look at how we attempted to understand which database system was right for us and how we adapted our approach when we encountered some unexpected challenges.
We knew that PostgreSQL wasn’t the right tool for this job, and many of the features that it provides — such as ACID transactions, MVCC semantics, and even row-based mutations — were ultimately unnecessary for the kinds of data that we were storing in it, as well as the types of queries we were running. In fact, not only were they unnecessary, but they caused performance issues at best, and had played a major role in our worst outage to date at worse.
We can’t say that PostgreSQL was the problem — it served us well for years, and we still happily use it in many different parts of our application and infrastructure today without any intention of removing it — it just wasn’t the right solution for the problems we were facing any longer.
We realized that we needed a system-oriented around fast aggregations over a large number of rows, and one optimized for bulk insertion of large amounts of data, rather than piecemeal insertion and mutation of individual rows.
Ultimately, after evaluating several options, we settled on ClickHouse, which is the database that currently underpins Snuba, our service for storing and searching event data. ClickHouse and PostgreSQL have very different architectures (some of which we’ll dive into more detail about a bit later), and these differences cause ClickHouse to perform extremely well for many of our needs: queries are fast, performance is predictable, and we’re able to filter and aggregate on more event attributes than we were able to before. Even more amazingly, we can do it with fewer machines and smaller disks due to the shockingly good compression that can be achieved with columnar data layouts.
ClickHouse can make many of these performance improvements because data that has been written is largely considered to be immutable, or not subject to change (or even deleted). Immutability plays a large role in database design, especially with large volumes of data — if you’re able to posit that data is immutable, DML statements like UPDATE and DELETE are no longer necessary.
If you’re just inserting data that never changes, the necessity for transactions is reduced (or removed completely), and a whole class of problems in database architecture goes away. This strategy works well for us — in general, we consider events that are sent to Sentry immutable once they have been processed.
This decision is mostly a practical one: for example, the browser version that a user was using when they encountered an error is effectively “frozen in time” when that event occurs. If that user later upgrades their browser version, the event that we recorded earlier doesn’t need to be rewritten to account for whatever version they’re using now.
But wait — while we do treat the event data that is sent to Sentry as immutable, the issues those events belong to can be deleted in Sentry, and those deletions should cause the events associated with those issues to be deleted as well. Similarly, while you can’t update the attributes of an event, you can modify its association with an issue through merging and unmerging. While these operations are infrequent, they are possible, and we needed to find a way to perform them in a database that wasn’t designed to support them.
Unfortunately, all of the massive improvements for the common cases were also drawbacks for several of the uncommon cases that exist in Sentry — but uncommon doesn’t also mean unsupported. In the remainder of this field guide, we’ll explore how mutability affects database design and performance and how we deal with mutating data in a database architecture that was primarily designed for storing immutable data: in this case, specifically ClickHouse.