A vendor-neutral .NET reader for Apache Iceberg tables.
IcebergSharp lets .NET applications read Apache Iceberg tables directly from a REST catalog (Apache Polaris, Project Nessie, Snowflake Open Catalog, AWS Glue, lakekeeper, Unity Catalog) without going through Spark, Trino, or any JVM service.
It handles the things that make Iceberg interesting and hard to get right: field-id resolution for schema evolution, partition pruning at the manifest level, column-stats pruning at the data-file level, and time travel via snapshots.
Today, .NET teams that want to read Iceberg either spin up a JVM service (Spark
Connect, Trino) and pay the latency / ops cost, or go through a query engine that
doesn't expose Iceberg's metadata to them. There's no native client that gives a
.NET app the same first-class access that pyiceberg gives Python or
iceberg-rust gives Rust. IcebergSharp aims to be that client: read-only, no JVM,
no embedded query engine — just metadata and Arrow batches you can hand to
DuckDB.NET, ML.NET, or Power BI.
Status: Phase 3 development. Core Iceberg metadata parsing, stream-based Avro manifest / manifest-list readers, and the read-only REST catalog client are implemented and covered by unit tests. Scan planning, file IO, live catalog validation, and Parquet data reads are still on the roadmap.
In scope for v1 — read-only Iceberg, focused on what analytical workloads actually need:
- Spec-compliant REST Catalog client with dynamic endpoint discovery; AWS Glue SigV4.
- File IO for local, S3, and ADLS Gen2.
- Parquet data files with field-id resolution.
- Partition pruning, column-stats pruning, projection pushdown.
- Time travel by snapshot id or timestamp.
- Schema evolution: add / drop / rename column, type promotion.
- Apache Arrow output for zero-copy interop with DuckDB.NET / Polars.NET / ML.NET.
Out of scope for v1 (load-bearing boundaries — re-discussing them risks turning the project into a half-finished engine instead of a focused reader):
- ❌ No write path. No
CREATE TABLE,INSERT, commits, or manifest writing. Writing Iceberg correctly is ~70% of the engineering effort and ~90% of the bugs in existing implementations. v1 is "read-only, done well." - ❌ No merge-on-read / delete files. COW tables only; delete files are skipped with a warning. Most analytical Iceberg workloads remain COW-dominant.
- ❌ No Hive Metastore. REST catalogs only — point an Iceberg REST adapter at HMS.
- ❌ No bundled SQL engine. You get
IAsyncEnumerable<RecordBatch>and Arrow streams; bring your own query layer.
- .NET 9 SDK (recommended) or .NET 8 SDK.
- For integration tests: Docker (for Polaris + MinIO containers).
The shipped packages multi-target net9.0 and net8.0. net8.0 is supported until
its LTS end-of-life (November 2026).
dotnet add package IcebergSharp.Core
dotnet add package IcebergSharp.Catalog
dotnet add package IcebergSharp.Reader
dotnet add package IcebergSharp.IO # local + S3 + ADLS
dotnet add package IcebergSharp.Arrow # optional: Apache Arrow outputusing IcebergSharp;
using IcebergSharp.Catalog;
using IcebergSharp.Catalog.Rest;
using IcebergSharp.Catalog.Rest.Authentication;
using IcebergSharp.Expressions;
// 1. Connect to a REST catalog (Polaris, Nessie, Glue, Snowflake Open Catalog, ...).
var catalog = new RestCatalog(new RestCatalogOptions
{
Uri = new Uri("https://polaris.example.com/api/catalog"),
Warehouse = "analytics",
Authentication = new OAuth2ClientCredentialsAuthentication(clientId, clientSecret),
});
// 2. Load a table.
var table = await catalog.LoadTableAsync(
TableIdentifier.From(CatalogNamespace.From("sales"), "orders"));
// 3. Plan a scan with predicate + projection pushdown.
var scan = table.NewScan()
.Filter(Expressions.And(
Expressions.GreaterThan("order_date", new DateOnly(2024, 1, 1)),
Expressions.Equal("region", "EU")))
.Select("order_id", "customer_id", "amount");
// 4. Stream the results — each task is one data file with an optional residual filter.
await foreach (var task in scan.PlanFilesAsync())
{
using var reader = await task.OpenReaderAsync();
await foreach (var record in reader.ReadAsync())
{
Console.WriteLine($"{record["order_id"]}: {record["amount"]}");
}
}
// 5. Time travel — same API, historical snapshot.
var lastWeek = table.NewScan()
.UseSnapshot(snapshotId: 1234567890L)
.Select("*");See docs/compatibility-matrix.md for the up-to-date matrix of supported catalogs, table-format versions, and storage backends.
Current implemented surface:
IcebergSharp.Core: Iceberg v1/v2 table metadata, schemas, partition specs, sort orders, snapshots, and manifest domain models.IcebergSharp.Avro: stream-based Avro OCF readers for Iceberg manifest lists and manifests, includingnullanddeflatecodecs.IcebergSharp.Catalog: read-only REST catalog client for config discovery, namespace/table listing, table metadata loading, and bearer/OAuth2/SigV4 auth.
Target servers for v1:
- Apache Polaris (reference implementation)
- Project Nessie
- AWS Glue Iceberg REST
- Snowflake Open Catalog
- lakekeeper
| Phase | Weeks | Deliverable |
|---|---|---|
| 0. Scaffolding | done | Repo, CI, license, solution layout |
| 1. Core types & metadata | done | Schema, TableMetadata, JSON parser |
| 2. Avro manifest reader | done | Custom mini Avro OCF reader for manifests |
| 3. REST catalog client | in progress | OAuth2 / Bearer / SigV4, dynamic discovery |
| 4. Scan planning & pruning | 7-8 | Partition + stats pruning, residual filters |
| 5. Parquet + schema evolution | 9-10 | Field-id resolution, add/drop/rename column |
| 6. Polish & release | 11-12 | NuGet 1.0.0, Power BI sample, blog post |
Catalog
└─ Namespace (e.g. "analytics.sales")
└─ Table
└─ TableMetadata (pointer held by the catalog)
├─ Schema (with field-ids)
├─ PartitionSpec[] ← evolves over time
├─ SortOrder[]
└─ Snapshot[]
└─ Manifest list file (Avro)
└─ ManifestFile[]
└─ DataFile[] (Parquet w/ field-id annotations)
Given catalog.LoadTable(...), the reader walks this tree, prunes by partition and
column stats, and streams Parquet rows with field-id resolution.
See CONTRIBUTING.md. The project is in early development; the fastest way to help right now is to try the metadata and manifest readers against real Iceberg tables and report incompatible schemas, codecs, or manifest shapes.
Dual licensed under your choice of:
This product is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. Apache, Apache Iceberg, Iceberg, and the Apache feather logo are trademarks of the Apache Software Foundation.
