Home United States USA — software Third-Generation Data Platforms: The Lakehouse

Third-Generation Data Platforms: The Lakehouse

111
0
SHARE

The lakehouse represents the next evolution of data platforms, aiming to combine the best of both data warehouses and data lakes.
Data Platform Evolution
Initially, data warehouses served as first-generation platforms primarily focused on processing structured data. However, as the demand for analyzing large volumes of semi-structured and unstructured data grew, second-generation platforms shifted their attention toward leveraging data lakes. This resulted in two-tier architectures with problematic side effects: complexity of maintaining and synchronizing the two tiers, data duplication, increased risks of failure due to data movement between warehouses and data lakes, and so on.
Data lakehouses are third-generation platforms created to address the above limitations. Lakehouses are open, cost-efficient architectures combining key benefits of data lakes and data warehouses. They do their magic by implementing a metadata layer on top of data lakes. 
The metadata layer is the defining element of the lakehouse. It brings structure and management capabilities similar to those of traditional warehouses into the data lake: transactional support (ACID), time travel, schema enforcement and evolution, data governance, access controls, and auditing. The lakehouse also enables real-time analytics, business intelligence (BI), data science, and machine learning (ML) by providing APIs for data processing activities and allowing the use of a vast array of languages and libraries.Lakehouse Platforms
Although it is theoretically possible to design one’s own lakehouse architecture, the general recommendation would be to use an already existing solution, just to save time, money, and headaches.
The various technologies competing in the lakehouse market can be classified into two main categories: 
Cloud-agnostic platforms, e.

Continue reading...