Amazon S3 now supports compaction of Apache Avro and ORC formats for Apache Iceberg tables

Amazon S3 has expanded compaction support to include Apache Avro and ORC formats for Apache Iceberg tables, complementing existing Parquet format capabilities. This enhancement works across both S3 Tables and for general purpose S3 buckets using AWS Glue Data Catalog optimization.

While Parquet is the default format for Iceberg tables, you can also write data in Avro or ORC formats for specific workloads. For example, you can use Avro to improve write performance for data ingestion and streaming use cases like daily purchase transactions, streaming sensor data, or collecting ad impressions. S3 Tables automatically compact small files into larger ones to minimize scanned data, improve query performance, and reduce costs. By default, compaction converts Avro and ORC files to Parquet for optimal read performance, but you can specify your preferred target format in your table properties.

Compaction support for Apache Avro and ORC formats is now available in all AWS Regions where S3 Tables or optimization with the AWS Glue Data Catalog are available. To learn more about S3 Tables compaction, see the S3 Tables maintenance documentation. For general purpose bucket optimization, see the AWS Glue Data Catalog optimization documentation.

Source:: Amazon AWS