Apache Iceberg
Open-source table format for massive data lakes, providing ACID transactions and efficient metadata management for analytical storage.
Updated on January 29, 2026
Apache Iceberg is a high-performance open-source table format designed to handle petabyte-scale datasets in data lake architectures. Unlike traditional formats like Parquet or ORC that only manage file storage, Iceberg provides a table management layer with transactional guarantees, scalable schema tracking, and time travel capabilities. Created at Netflix and now an Apache top-level project, it solves critical consistency and performance issues in modern data architectures.
Fundamentals of Apache Iceberg
- Three-layer architecture: data files (Parquet/ORC/Avro), manifest files (file lists), and table metadata (snapshots, schemas, partitioning)
- Full ACID transactions with serializable isolation, enabling consistent reads even during concurrent writes
- Schema evolution without rewrites: add, drop, and rename columns without affecting existing data
- Hidden partitioning: partition transformations applied automatically without exposing structure to queries
Strategic Benefits
- Optimal performance: partition and file pruning based on detailed statistics, drastically reducing scanned data volume
- Built-in time travel: access any historical snapshot of the table for audits, reproductions, or rollbacks
- Multi-engine compatibility: works natively with Spark, Flink, Trino, Hive, Presto without proprietary ecosystem dependencies
- Metadata scalability: optimized structure to handle millions of partitions without planning performance degradation
- Advanced atomic operations: MERGE, UPDATE, DELETE performed transactionally on distributed tables
Practical Example: Iceberg Table Architecture
from pyspark.sql import SparkSession
# Spark configuration for Iceberg
spark = SparkSession.builder \
.appName("IcebergDemo") \
.config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.4.0") \
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.my_catalog.type", "hadoop") \
.config("spark.sql.catalog.my_catalog.warehouse", "s3://my-bucket/warehouse") \
.getOrCreate()
# Create Iceberg table with hidden partitioning
spark.sql("""
CREATE TABLE my_catalog.db.events (
event_id STRING,
user_id LONG,
event_type STRING,
event_timestamp TIMESTAMP,
metadata MAP<STRING, STRING>
)
USING iceberg
PARTITIONED BY (days(event_timestamp), event_type)
TBLPROPERTIES (
'write.format.default' = 'parquet',
'write.metadata.compression-codec' = 'gzip'
)
""")
# Insert data
df = spark.read.json("s3://source/events/*.json")
df.writeTo("my_catalog.db.events").append()
# Time travel: read version from 2 hours ago
spark.read \
.option("as-of-timestamp", "2024-01-15 10:00:00") \
.table("my_catalog.db.events") \
.show()
# Atomic MERGE operation
spark.sql("""
MERGE INTO my_catalog.db.events t
USING updates s
ON t.event_id = s.event_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
""")
# Schema evolution without downtime
spark.sql("""
ALTER TABLE my_catalog.db.events
ADD COLUMN device_type STRING AFTER event_type
""")Implementation Steps
- Catalog selection: choose between Hive Metastore, AWS Glue, Nessie, or JDBC based on existing infrastructure
- Storage configuration: define warehouse on S3, ADLS, GCS, or HDFS with appropriate permissions
- Compute engine integration: configure Spark, Flink, or Trino with Iceberg extensions and connectors
- Partitioning strategy: define hidden transformations (days, hours, bucket) based on query patterns
- Retention policy: configure snapshot expiration and cleanup to optimize storage costs
- Progressive migration: use migration procedures to convert existing tables (Hive, Delta) to Iceberg
- Monitoring: implement tracking for compaction metrics, snapshot count, and table size
Performance Optimization
Regularly execute maintenance operations: REWRITE DATA FILES to optimize file sizes (avoid small files), EXPIRE SNAPSHOTS to remove obsolete history, and REWRITE MANIFESTS to consolidate fragmented metadata. These operations maintain optimal long-term performance.
Tools and Ecosystem
- Apache Spark: primary engine for batch and streaming operations on Iceberg tables
- Apache Flink: real-time streaming with native support for ACID Iceberg writes
- Trino/Presto: high-performance interactive SQL querying on Iceberg data lakes
- Nessie: Git-like catalog providing branches, tags, and versioning for Iceberg tables
- AWS Glue/Azure Purview: managed catalogs with Iceberg support for centralized metadata
- dbt: SQL transformations with incremental materialization support on Iceberg
- Tableau/Looker: visualization and BI directly on Iceberg tables via JDBC/ODBC connectors
Apache Iceberg represents a major evolution in data lake architecture, bringing database-type guarantees to cloud storage scale. Its ability to provide ACID transactions, time travel, and schema evolution without compromising performance makes it the preferred choice for modern data architectures requiring reliability and flexibility. By unifying batch and streaming workloads under a standardized and vendor-neutral format, Iceberg reduces operational complexity while improving data governance and infrastructure cost efficiency.

