GitHub - AndreaBozzo/IcebergSharp: Vendor-neutral .NET reader for Apache Iceberg tables

A vendor-neutral .NET reader for Apache Iceberg tables.

IcebergSharp lets .NET applications read Apache Iceberg tables directly from a REST catalog (Apache Polaris, Project Nessie, Snowflake Open Catalog, AWS Glue, lakekeeper, Unity Catalog) without going through Spark, Trino, or any JVM service.

It handles the things that make Iceberg interesting and hard to get right: field-id resolution for schema evolution, partition pruning at the manifest level, column-stats pruning at the data-file level, and time travel via snapshots.

My idea behind IcebergSharp

Today, .NET teams that want to read Iceberg either spin up a JVM service (Spark Connect, Trino) and pay the latency / ops cost, or go through a query engine that doesn't expose Iceberg's metadata to them. There's no native client that gives a .NET app the same first-class access that pyiceberg gives Python or iceberg-rust gives Rust. IcebergSharp aims to be that client: read-only, no JVM, no embedded query engine — just metadata and Arrow batches you can hand to DuckDB.NET, ML.NET, or Power BI.

Status: Phase 3 development. Core Iceberg metadata parsing, stream-based Avro manifest / manifest-list readers, and the read-only REST catalog client are implemented and covered by unit tests. Scan planning, file IO, live catalog validation, and Parquet data reads are still on the roadmap.

Scope (what this library does and does NOT do)

In scope for v1 — read-only Iceberg, focused on what analytical workloads actually need:

Spec-compliant REST Catalog client with dynamic endpoint discovery; AWS Glue SigV4.
File IO for local, S3, and ADLS Gen2.
Parquet data files with field-id resolution.
Partition pruning, column-stats pruning, projection pushdown.
Time travel by snapshot id or timestamp.
Schema evolution: add / drop / rename column, type promotion.
Apache Arrow output for zero-copy interop with DuckDB.NET / Polars.NET / ML.NET.

Out of scope for v1 (load-bearing boundaries — re-discussing them risks turning the project into a half-finished engine instead of a focused reader):

❌ No write path. No CREATE TABLE, INSERT, commits, or manifest writing. Writing Iceberg correctly is ~70% of the engineering effort and ~90% of the bugs in existing implementations. v1 is "read-only, done well."
❌ No merge-on-read / delete files. COW tables only; delete files are skipped with a warning. Most analytical Iceberg workloads remain COW-dominant.
❌ No Hive Metastore. REST catalogs only — point an Iceberg REST adapter at HMS.
❌ No bundled SQL engine. You get IAsyncEnumerable<RecordBatch> and Arrow streams; bring your own query layer.

Requirements

.NET 9 SDK (recommended) or .NET 8 SDK.
For integration tests: Docker (for Polaris + MinIO containers).

The shipped packages multi-target net9.0 and net8.0. net8.0 is supported until its LTS end-of-life (November 2026).

Installation (planned, not yet published)

dotnet add package IcebergSharp.Core
dotnet add package IcebergSharp.Catalog
dotnet add package IcebergSharp.Reader
dotnet add package IcebergSharp.IO          # local + S3 + ADLS
dotnet add package IcebergSharp.Arrow       # optional: Apache Arrow output

Quick start (target API for v1.0)

using IcebergSharp;
using IcebergSharp.Catalog;
using IcebergSharp.Catalog.Rest;
using IcebergSharp.Catalog.Rest.Authentication;
using IcebergSharp.Expressions;

// 1. Connect to a REST catalog (Polaris, Nessie, Glue, Snowflake Open Catalog, ...).
var catalog = new RestCatalog(new RestCatalogOptions
{
    Uri            = new Uri("https://polaris.example.com/api/catalog"),
    Warehouse      = "analytics",
    Authentication = new OAuth2ClientCredentialsAuthentication(clientId, clientSecret),
});

// 2. Load a table.
var table = await catalog.LoadTableAsync(
    TableIdentifier.From(CatalogNamespace.From("sales"), "orders"));

// 3. Plan a scan with predicate + projection pushdown.
var scan = table.NewScan()
    .Filter(Expressions.And(
        Expressions.GreaterThan("order_date", new DateOnly(2024, 1, 1)),
        Expressions.Equal("region", "EU")))
    .Select("order_id", "customer_id", "amount");

// 4. Stream the results — each task is one data file with an optional residual filter.
await foreach (var task in scan.PlanFilesAsync())
{
    using var reader = await task.OpenReaderAsync();
    await foreach (var record in reader.ReadAsync())
    {
        Console.WriteLine($"{record["order_id"]}: {record["amount"]}");
    }
}

// 5. Time travel — same API, historical snapshot.
var lastWeek = table.NewScan()
    .UseSnapshot(snapshotId: 1234567890L)
    .Select("*");

Compatibility

See docs/compatibility-matrix.md for the up-to-date matrix of supported catalogs, table-format versions, and storage backends.

Current implemented surface:

IcebergSharp.Core: Iceberg v1/v2 table metadata, schemas, partition specs, sort orders, snapshots, and manifest domain models.
IcebergSharp.Avro: stream-based Avro OCF readers for Iceberg manifest lists and manifests, including null and deflate codecs.
IcebergSharp.Catalog: read-only REST catalog client for config discovery, namespace/table listing, table metadata loading, and bearer/OAuth2/SigV4 auth.

Target servers for v1:

Apache Polaris (reference implementation)
Project Nessie
AWS Glue Iceberg REST
Snowflake Open Catalog
lakekeeper

Roadmap

Phase	Weeks	Deliverable
0. Scaffolding	done	Repo, CI, license, solution layout
1. Core types & metadata	done	`Schema`, `TableMetadata`, JSON parser
2. Avro manifest reader	done	Custom mini Avro OCF reader for manifests
3. REST catalog client	in progress	OAuth2 / Bearer / SigV4, dynamic discovery
4. Scan planning & pruning	7-8	Partition + stats pruning, residual filters
5. Parquet + schema evolution	9-10	Field-id resolution, add/drop/rename column
6. Polish & release	11-12	NuGet 1.0.0, Power BI sample, blog post

Architecture overview

Catalog
  └─ Namespace (e.g. "analytics.sales")
       └─ Table
            └─ TableMetadata (pointer held by the catalog)
                 ├─ Schema (with field-ids)
                 ├─ PartitionSpec[]   ← evolves over time
                 ├─ SortOrder[]
                 └─ Snapshot[]
                      └─ Manifest list file (Avro)
                           └─ ManifestFile[]
                                └─ DataFile[] (Parquet w/ field-id annotations)

Given catalog.LoadTable(...), the reader walks this tree, prunes by partition and column stats, and streams Parquet rows with field-id resolution.

Contributing

See CONTRIBUTING.md. The project is in early development; the fastest way to help right now is to try the metadata and manifest readers against real Iceberg tables and report incompatible schemas, codecs, or manifest shapes.

License

Dual licensed under your choice of:

This product is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. Apache, Apache Iceberg, Iceberg, and the Apache feather logo are trademarks of the Apache Software Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
docs		docs
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Directory.Build.props		Directory.Build.props
Directory.Build.targets		Directory.Build.targets
Directory.Packages.props		Directory.Packages.props
IcebergSharp.slnx		IcebergSharp.slnx
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
global.json		global.json
icon.png		icon.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My idea behind IcebergSharp

Scope (what this library does and does NOT do)

Requirements

Installation (planned, not yet published)

Quick start (target API for v1.0)

Compatibility

Roadmap

Architecture overview

Contributing

License

About

Licenses found

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

My idea behind IcebergSharp

Scope (what this library does and does NOT do)

Requirements

Installation (planned, not yet published)

Quick start (target API for v1.0)

Compatibility

Roadmap

Architecture overview

Contributing

License

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages