Домой United States USA — software Milvus 2.0: Redefining Vector Database

Milvus 2.0: Redefining Vector Database

По

August 1, 2021

777

This article offers a recap of an open-source vector database challenges that this refactored, cloud-native version of Milvus 2.0 is expected to address.
Join the DZone community and get the full member experience. As a popular open-source vector database, Milvus 1.0 managed to solve some fundamental issues in vector management, such as CRUD operations and data persistence. However, as new scenarios and requirements emerged, it became clear that there are so many more issues yet to resolve. This article offers a recap on the observations and challenges that Milvus 2.0 is expected to address, and why Milvus 2.0 is deemed a better solution to such challenges. Challenges Facing Milvus 1.x Data silo: Milvus 1.0 is only capable of handling vector embeddings generated from unstructured data, and gives little support for scalar queries. The disaggregation of data storage in its design results in duplicate data and adds to the complexity of application development. The hybrid search between vector and scalar data is unsatisfactory due to the lack of a unified optimizer. The Dilemma Between Timeliness and Efficiency Milvus 1.0 is a near real-time system, which relies on regular flush or force flush to ensure data visibility. This approach adds to the complexity and uncertainty in a stream data processing at a number of levels. Besides, although this batch insertion approach is said to improve processing efficiency, it still consumes plenty of resources which calls the bulk-load approach. Lacking Scalability and Elasticity Milvus 1.0 relies on Mishards, a sharding middleware solution, to achieve scalability and network-attached storage (NAS) for data persistence. This classical architecture built upon shared storage does not contribute much to the overall scalability for the following reasons: Lacking High Availability One observation we’ve made is that most of Milvus’ users tend to favor availability over consistency, whilst Milvus 1.x lacks capacities such as in-memory replicas and disaster recovery and is not quite up to par in terms of high availability. Therefore, we are exploring the possibility of sacrificing a certain degree of accuracy to achieve higher availability. Prohibitively High Costs Milvus 1.0 relies on NAS for data persistence, the cost of which is usually tenfold that of a local or object storage. Since vector search relies heavily on computing resources and memory, the high costs it incurs could well become a hurdle to further exploration in large-scale datasets or complex business scenarios. Unintuitive User Experience Whether to move on from the patch or to start from scratch is a big question. Charles Xie, the father of Milvus, believes that, just as many traditional automakers could never progressively turn to Tesla, Milvus has to become a game-changer in the field of unstructured data processing and analytics in order to thrive. It is this conviction that spurred us to kick start Milvus 2.0, a refactored cloud-native vector database. The Making of Milvus 2.0 Design Principles As the next-generation cloud-native vector database, Milvus 2.0 is built around the following three principles: Cloud-Native First We believe that only architectures supporting storage and computing separation can scale on demand and take full advantage of the cloud’s elasticity. The microservice design of Milvus 2.0, which features read and write separation, incremental and historical data separation, and CPU-intensive, memory-intensive, and IO-intensive task separation.