Home United States USA — IT AWS re:Invent: Amazon S3 Expands Capabilities with Managed Apache Iceberg Tables for...

AWS re:Invent: Amazon S3 Expands Capabilities with Managed Apache Iceberg Tables for Faster Data Lake Analytics and Automatic Metadata Generation

77
0
SHARE

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, announced new Amazon Simple Storage Service (Amazon S3) features that make S3 the 1st cloud object store with fully-managed support for Apache Iceberg for faster analytics and the easiest way to store and manage tabular data at any scale. Click to enlarge These […]
At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, announced new Amazon Simple Storage Service (Amazon S3) features that make S3 the 1st cloud object store with fully-managed support for Apache Iceberg for faster analytics and the easiest way to store and manage tabular data at any scale.
Click to enlarge
These features also include the ability to automatically generate queryable metadata, simplifying data discovery and understanding to help customers unlock the value of their data in S3.
Amazon S3 Tables is the 1st cloud object store with built-in Apache Iceberg table support and introduces a new bucket type to optimize storage and querying of tabular data as Iceberg tables, delivering up to 3x faster query performance, up to 10x higher transactions/second (TPS), and automated table maintenance and automation for analytics workloads.
Amazon S3 Metadata streamlines data discovery in near real-time by automatically capturing queryable object metadata, as well as custom metadata using object tags, storing it in S3 Tables for accelerating analytics across data lakes.
“As the leading object store in the world with more than 400 trillion objects, S3 is used by millions of customers, and we continue to innovate to remove the complexity of working with data at an unprecedented scale,” said Andy Warfield, VP, storage, and distinguished engineer, AWS. “We have seen the rapid rise of tabular data and, increasingly, customers want to query across tables, improve query performance, and understand and organize troves of data so they can easily find exactly what they need. S3 Tables and S3 Metadata remove the overhead of organizing and operating table and metadata stores on top of objects, so customers can shift their focus back to building with their data.”
S3 Tables and S3 Metadata are Apache Iceberg table-compatible so customers can query their data using AWS analytics services and open source tools, including Amazon Athena, Amazon QuickSight, and Apache Spark.
Amazon S3 Tables – easiest and fastest way to perform analytics on Apache Iceberg tables in S3
Many customers today organize the data they use for analytics as tabular data, most often stored in Apache Parquet, a file format optimized for data queries. Parquet has become one of the fastest growing data types in S3, and customers increasingly want to be able to query these growing tabular data sets – often turning to open table formats (OTF), an open source standard for storing data in tables – because it helps organize, update, and track changes to large amounts of data. Iceberg has become the most popular OTFs to manage Parquet files, with customers using Iceberg to query across billions of files containing petabytes or even exabytes of data. However, Iceberg can be challenging for customers to manage as they scale, often requiring dedicated teams to build and maintain systems to handle table maintenance and data compaction, as well as manage access control.

Continue reading...