Skip to content

AndreaBozzo/IcebergSharp

IcebergSharp

A vendor-neutral .NET reader for Apache Iceberg tables.

CI Integration License: MIT OR Apache-2.0 .NET Iceberg spec

IcebergSharp lets .NET applications read Apache Iceberg tables directly from a REST catalog (Apache Polaris, Project Nessie, Snowflake Open Catalog, AWS Glue, lakekeeper, Unity Catalog) without going through Spark, Trino, or any JVM service.

It handles the things that make Iceberg interesting and hard to get right: field-id resolution for schema evolution, partition pruning at the manifest level, column-stats pruning at the data-file level, and time travel via snapshots.

My idea behind IcebergSharp

Today, .NET teams that want to read Iceberg either spin up a JVM service (Spark Connect, Trino) and pay the latency / ops cost, or go through a query engine that doesn't expose Iceberg's metadata to them. There's no native client that gives a .NET app the same first-class access that pyiceberg gives Python or iceberg-rust gives Rust. IcebergSharp aims to be that client: read-only, no JVM, no embedded query engine — just metadata and Arrow batches you can hand to DuckDB.NET, ML.NET, or Power BI.

Status: Phase 3 development. Core Iceberg metadata parsing, stream-based Avro manifest / manifest-list readers, and the read-only REST catalog client are implemented and covered by unit tests. Scan planning, file IO, live catalog validation, and Parquet data reads are still on the roadmap.


Scope (what this library does and does NOT do)

In scope for v1 — read-only Iceberg, focused on what analytical workloads actually need:

  • Spec-compliant REST Catalog client with dynamic endpoint discovery; AWS Glue SigV4.
  • File IO for local, S3, and ADLS Gen2.
  • Parquet data files with field-id resolution.
  • Partition pruning, column-stats pruning, projection pushdown.
  • Time travel by snapshot id or timestamp.
  • Schema evolution: add / drop / rename column, type promotion.
  • Apache Arrow output for zero-copy interop with DuckDB.NET / Polars.NET / ML.NET.

Out of scope for v1 (load-bearing boundaries — re-discussing them risks turning the project into a half-finished engine instead of a focused reader):

  • No write path. No CREATE TABLE, INSERT, commits, or manifest writing. Writing Iceberg correctly is ~70% of the engineering effort and ~90% of the bugs in existing implementations. v1 is "read-only, done well."
  • No merge-on-read / delete files. COW tables only; delete files are skipped with a warning. Most analytical Iceberg workloads remain COW-dominant.
  • No Hive Metastore. REST catalogs only — point an Iceberg REST adapter at HMS.
  • No bundled SQL engine. You get IAsyncEnumerable<RecordBatch> and Arrow streams; bring your own query layer.

Requirements

  • .NET 9 SDK (recommended) or .NET 8 SDK.
  • For integration tests: Docker (for Polaris + MinIO containers).

The shipped packages multi-target net9.0 and net8.0. net8.0 is supported until its LTS end-of-life (November 2026).


Installation (planned, not yet published)

dotnet add package IcebergSharp.Core
dotnet add package IcebergSharp.Catalog
dotnet add package IcebergSharp.Reader
dotnet add package IcebergSharp.IO          # local + S3 + ADLS
dotnet add package IcebergSharp.Arrow       # optional: Apache Arrow output

Quick start (target API for v1.0)

using IcebergSharp;
using IcebergSharp.Catalog;
using IcebergSharp.Catalog.Rest;
using IcebergSharp.Catalog.Rest.Authentication;
using IcebergSharp.Expressions;

// 1. Connect to a REST catalog (Polaris, Nessie, Glue, Snowflake Open Catalog, ...).
var catalog = new RestCatalog(new RestCatalogOptions
{
    Uri            = new Uri("https://polaris.example.com/api/catalog"),
    Warehouse      = "analytics",
    Authentication = new OAuth2ClientCredentialsAuthentication(clientId, clientSecret),
});

// 2. Load a table.
var table = await catalog.LoadTableAsync(
    TableIdentifier.From(CatalogNamespace.From("sales"), "orders"));

// 3. Plan a scan with predicate + projection pushdown.
var scan = table.NewScan()
    .Filter(Expressions.And(
        Expressions.GreaterThan("order_date", new DateOnly(2024, 1, 1)),
        Expressions.Equal("region", "EU")))
    .Select("order_id", "customer_id", "amount");

// 4. Stream the results — each task is one data file with an optional residual filter.
await foreach (var task in scan.PlanFilesAsync())
{
    using var reader = await task.OpenReaderAsync();
    await foreach (var record in reader.ReadAsync())
    {
        Console.WriteLine($"{record["order_id"]}: {record["amount"]}");
    }
}

// 5. Time travel — same API, historical snapshot.
var lastWeek = table.NewScan()
    .UseSnapshot(snapshotId: 1234567890L)
    .Select("*");

Compatibility

See docs/compatibility-matrix.md for the up-to-date matrix of supported catalogs, table-format versions, and storage backends.

Current implemented surface:

  • IcebergSharp.Core: Iceberg v1/v2 table metadata, schemas, partition specs, sort orders, snapshots, and manifest domain models.
  • IcebergSharp.Avro: stream-based Avro OCF readers for Iceberg manifest lists and manifests, including null and deflate codecs.
  • IcebergSharp.Catalog: read-only REST catalog client for config discovery, namespace/table listing, table metadata loading, and bearer/OAuth2/SigV4 auth.

Target servers for v1:

  • Apache Polaris (reference implementation)
  • Project Nessie
  • AWS Glue Iceberg REST
  • Snowflake Open Catalog
  • lakekeeper

Roadmap

Phase Weeks Deliverable
0. Scaffolding done Repo, CI, license, solution layout
1. Core types & metadata done Schema, TableMetadata, JSON parser
2. Avro manifest reader done Custom mini Avro OCF reader for manifests
3. REST catalog client in progress OAuth2 / Bearer / SigV4, dynamic discovery
4. Scan planning & pruning 7-8 Partition + stats pruning, residual filters
5. Parquet + schema evolution 9-10 Field-id resolution, add/drop/rename column
6. Polish & release 11-12 NuGet 1.0.0, Power BI sample, blog post

Architecture overview

Catalog
  └─ Namespace (e.g. "analytics.sales")
       └─ Table
            └─ TableMetadata (pointer held by the catalog)
                 ├─ Schema (with field-ids)
                 ├─ PartitionSpec[]   ← evolves over time
                 ├─ SortOrder[]
                 └─ Snapshot[]
                      └─ Manifest list file (Avro)
                           └─ ManifestFile[]
                                └─ DataFile[] (Parquet w/ field-id annotations)

Given catalog.LoadTable(...), the reader walks this tree, prunes by partition and column stats, and streams Parquet rows with field-id resolution.


Contributing

See CONTRIBUTING.md. The project is in early development; the fastest way to help right now is to try the metadata and manifest readers against real Iceberg tables and report incompatible schemas, codecs, or manifest shapes.


License

Dual licensed under your choice of:

This product is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. Apache, Apache Iceberg, Iceberg, and the Apache feather logo are trademarks of the Apache Software Foundation.

About

Vendor-neutral .NET reader for Apache Iceberg tables

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages