Home United States USA — software Why Disintegration of Apache Zookeeper From Kafka Is in the Pipeline

Why Disintegration of Apache Zookeeper From Kafka Is in the Pipeline

233
0
SHARE

In this article, see why the disintegration of Apache Zookeeper from Kafka is in the pipeline.
Join the DZone community and get the full member experience. The main objective of this article is to highlight why to cut the bridge between Apache Zookeeper and Kafka which is an upcoming project from the Apache software foundation. Also, the proposed architecture/solution aims to make the Kafka completely independent in delivering the entire functionalities that currently offering today with Zookeeper. This article has been segmented into 4 parts. Zookeeper is acting as a lever in terms of management and coordination in a distributed environment to manage larger sets of hosts. It’s quite tricky and complicated in a large cluster where more number of a node connected and need to scale horizontally on demand. For example, a Hadoop cluster where new DataNodes plugins once data volume grows, need more replication on data blocks, etc. Apart from Hadoop, it’s being used in other Apache projects like HBase, Solr, CXF DOSGi, etc and others. You can see here the list. Zookeeper plays a key role as a distributed coordination service and adopted for use cases like storing shared configuration, electing the master node, etc. To achieve synchronization, serialization, and coordination, Zookeeper keeps the distributed system functioning together as a single unit for simplicity. Zookeeper takes care of the Race condition, Deadlock, partial failure issues in a distributed application which are very common. Zookeeper’s serialization property eliminates the Race condition in the cluster subsequently Deadlock using synchronization property. Kafka is an enterprise messing system with the capability of building data pipelines for real-time streaming. Apache Kafka originated at LinkedIn and later became an open-source Apache project in 2011. Besides, Kafka stores the stream of records/data in a fault-tolerant way.

Continue reading...