Home United States USA — software Data Lakes, Warehouses and Lakehouses. Which is Best?

Data Lakes, Warehouses and Lakehouses. Which is Best?


Everything you need to know about the foundation of your data infrastructure: data warehouse, data lakes, and data bakehouses.
Join the DZone community and get the full member experience. Twenty years ago, your data warehouse probably wouldn’t have been voted hottest technology on the block. These bastions of the office basement were long associated with siloed data workflows, on-premises computing clusters, and a limited set of business-related tasks (i.e., processing payroll, and storing internal documents). Now, with the rise of data-driven analytics, cross-functional data teams, and most importantly, the cloud, the phrase “cloud data warehouse” is nearly analogous to agility and innovation. In many ways, the cloud makes data easier to manage, more accessible to a wider variety of users, and far faster to process. Companies literally can’t use data in a meaningful way without leveraging a cloud data warehousing solution (or two or three… or more). When it comes to selecting the right cloud data warehouse for your data platform, however, the answer isn’t as straightforward. With the release of Amazon Redshift in 2013 followed by Snowflake, Google Big Query, and others in the subsequent years, the market has become increasingly hot. Add data lakes to the mix, and the decision becomes that much harder. Whether you’re just getting started or are in the process of re-assessing your existing solution, here’s everything you need to know to choose the right data warehouse (or lake) for your data stack:
Data warehouses and lakes are the foundation of your data infrastructure, providing the storage, compute power and contextual information about the data in your ecosystem. Like the engine of a car, these technologies are the workhorse of the data platform. Data warehouses and lakes incorporate the following four main components:
Warehouses and lakes typically offer a way to manage and track all the databases, schemas, and tables that you create. These objects are often accompanied by additional information such as schema, data types, user-generated descriptions, or even freshness and other statistics about the data. Storage refers to the way in which the warehouse/lake physically stores all the records that exist across all tables. By leveraging various kinds of storage technologies and data formats, warehouses/lakes can serve a wide range of use cases with desired cost/performance characteristics. Compute refers to the way in which the warehouse/lake performs calculations on the data records it stores. This is the engine that allows users to “query” data, ingest data, transform it – and more broadly, extract value from it.

Continue reading...