Домой United States USA — software The Ultimate Guide on Client-Generated IDs in JPA Entities

The Ultimate Guide on Client-Generated IDs in JPA Entities

По

February 24, 2022

175

ID generation on the client-side is not as simple as it seems. In JPA and Hibernate, we can use UUIDs, custom strategies, and dedicated ID generation servers.
Join the DZone community and get the full member experience. ID generation in the client instead of the database is the only option for distributed apps. But generating unique IDs in such apps is hard. And it’s essential to generate them properly because JPA will use IDs to define entity states. The safest option is to use UUIDs and Hibernate’s generators, but there are more options starting from custom generators to dedicated ID generation servers. In the previous article, we discussed server-generated IDs for JPA entities. All the ID generation strategies described in the article are based on one fundamental principle: there is a single point that is responsible for generating IDs: a database. This principle might become a challenge: we depend on a particular storage system, so switching to another (e.g., from PostgreSQL to Cassandra) might be a problem. Also, this approach does not work for distributed applications where we can have several DB instances deployed on several data centers in several time zones. Those are the cases where client-based ID generation (or, rather, non-DB-based) comes into a stage. This strategy gives us more flexibility in terms of ID generation algorithm and format and allows batch operations by its nature: ID values are known before they are stored in a DB. In this article, we will discuss two fundamental topics for client-generated ID strategy: how to generate a unique ID value and when to assign it. When it comes to ID generation in distributed applications, we need to decide which algorithm to use to guarantee uniqueness and sound generation performance. Let’s have a look at some options here. This is a straightforward and naïve implementation for decentralized ID generation. Let every application instance generate a unique ID using a random number generator, and that’s it! To make it better, we might think of using a composite structure — let’s append timestamp (in milliseconds) to the beginning of the random number to make our IDs sortable. For example, to create a 64-bit ID, we can use the first 32 bits of the timestamp and the last 32 bits of the random number. The problem with this approach is that it does not guarantee uniqueness. We can only hope that our generated IDs won’t clash. For big, distributed data-intensive systems, this approach is not acceptable. We cannot rely on probability laws unless we’re a casino. We should not reinvent the wheel for globally unique ID generation algorithms. It will take a lot of time, effort, and a couple of PhDs. Some existing solutions solve this problem and can be utilized in our applications. UUID generation – is a well-known and widely used approach for ID generation in distributed applications. This data type is supported by standard libraries in almost all programming languages. We can generate ID value right in the application code, and this value will be globally unique (by the design of the generation algorithm). UUIDs has some advantages over “traditional” numeric IDs: UUIDs are not sortable however sorting data by surrogate ID value is usually not required; we should use a business key for that. But if we absolutely need sorting, we can use the UUID subtype – ULID, which stands for “universally unique lexicographically sortable identifier”. The performance of the random UUID generator in Java is also sufficient for most cases. On my computer (Apple M1 max), it took about 500ns per operation, which gives us about two million UUIDs per second. UUID is almost the perfect choice for the ID value, but a few things might prevent you from using it. First, UUID values consume more storage space compared to 64-bit long IDs. Twice the space, if we need to be exact. Extra 64 bits might not look like a significant addition, but it might be a problem when talking about billions of records. Also, we should remember about foreign keys where we need to duplicate ID values. Therefore, we might double the ID storage consumption. The second issue is performance. Two factors are affecting this: It means that when we insert a new record into a table, the RDBMS writes its ID value into a random b-tree node of an index or a table structure. Since most of the index or table data is stored on the disk, the probability of random disk reads increases. It means further delays in the data storage process. You can find more on this topic in this article. And finally, some databases just do not support UUID as the data type, so we’ll have to store ID value as varchar or byte array, which may not be great for queries performance and will require some extra encoding on the ORM side. UUID is a good choice for surrogate IDs if we don’t want or cannot use a database for ID generation.