Description
RavenDB snapshot backups produced with ZipArchive can be unrecoverable due to ZIP header corruption. The issue is that producing a snapshot backup which is ZIP archive with System.IO.Compression.ZipArchive over a specific data set result in ZIP fails to open correctly:
- 7‑Zip shows
Extra_ERROR Zip64_ERROR: UTF8 (for entry Documents\Raven.voron), and the Packed Size looks capped at 4GB.
System.IO.Compression.ZipFile.OpenRead(...).Entries[i].Open() throws System.IO.InvalidDataException: A local file header is corrupt.
Writing the exact same dataset and order using SharpZipLib’s ZipOutputStream produces a valid ZIP that both 7‑Zip and ZipFile.OpenRead can read.
This started affecting us after introducing a feature that creates many per-index journal files that are hard links to the same underlying file content (so multiple distinct file paths share the exact same bytes on disk). Our dataset also includes a large 30GB file (Raven.voron). The combination seems to trigger a bug.
Reproduction Steps
Repro dataset
> $RootPath = (Get-Item .).FullName; Get-ChildItem -Path . -Include *.journal -Recurse -File | Get-FileHash | Select-Object @{Name='Path'; Expression={ $_.Path.Replace($RootPath + "\", "") }}, Hash, Algorithm
Path Hash Algorithm
---- ---- ---------
Configuration\Journals\0000000000000000001.journal 96F77B06EBF13895A297B7182BC162B42A05CC9B444D488A87FA541CD9962516 SHA256
Indexes\@SharedJournals\Journals\0000000000000000107.journal 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Activity_ByMonth\Journals\0000000000000000008.jou... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Questions_Search\Journals\0000000000000000004.jou... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Questions_Tags\Journals\0000000000000000007.journal 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Questions_Tags_ByMonths\Journals\0000000000000000... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Users_Registrations_ByMonth\Journals\000000000000... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Users_Search\Journals\0000000000000000005.journal 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Repro app
Single‑file console app (targets net8.0 or net10.0). It copies files from the dataset into a ZIP using ZipArchive, in the exact order RavenDB snapshot backup uses:
- Indexes (excluding any @* folder such as @SharedJournals), then
- Documents (root storage env), then
- Configuration folder
// Add package: ICSharpCode.SharpZipLib
//
// Example csproj snippet:
// <ItemGroup>
// <PackageReference Include="SharpZipLib" Version="1.4.2" />
// </ItemGroup>
//
// Usage:
// ZipArchiveIssue <sourceDbFolder> <outputDir> [options]
//
// Options:
// --ziparchive Generate ZIP using System.IO.Compression.ZipArchive
// --sharpzip Generate ZIP using SharpZipLib ZipOutputStream
// --level=<Optimal|Fastest|NoCompression> Compression level (default: Optimal)
// --nonseekable Wrap output stream to simulate non-seekable sink (ZipArchive data-descriptor path)
// --outname=<baseName> Base file name (default: derived from folder name)
// --verify After writing, attempt to open/read entries via ZipFile.OpenRead
//
// Mapping mirrors RavenDB snapshot shape, copying from disk:
// - Order: Indexes -> Documents -> Configuration (matches RavenDB snapshot backup)
// - Root DB env -> Documents/
// - Configuration/ -> Configuration/
// - Indexes/<IndexName>/ -> Indexes/<IndexName>/
// - Include files: Raven.voron, headers.one, headers.two, database.metadata, Journals/*.journal
// - Skip: any Temp/ folders, and all Indexes/@* folders (e.g. @SharedJournals)
#nullable enable
using System;
using System.Buffers;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Text;
using ICSharpCode.SharpZipLib.Zip;
internal static class Program
{
private static int Main(string[] args)
{
try
{
if (args.Length < 2)
{
PrintHelp();
return 2;
}
var sourceRoot = Path.GetFullPath(args[0]);
var outDir = Path.GetFullPath(args[1]);
var opts = ParseOptions(args.Skip(2));
if (!Directory.Exists(sourceRoot))
{
Console.Error.WriteLine($"Source folder not found: {sourceRoot}");
return 3;
}
Directory.CreateDirectory(outDir);
var baseName = opts.OutName ?? new DirectoryInfo(sourceRoot).Name;
// Enumerate entries strictly in RavenDB order: Indexes -> Documents -> Configuration
var entries = EnumerateBackupEntriesInRavenOrder(sourceRoot).ToList();
if (entries.Count == 0)
Console.Error.WriteLine("No entries to add based on current mapping (check input path).");
else
Console.WriteLine($"Enumerated {entries.Count} entries to zip.");
var createdAny = false;
if (opts.UseZipArchive)
{
var path = Path.Combine(outDir, baseName + "-ziparchive.zip");
Console.WriteLine($"[ZipArchive] Writing {path} ...");
using(var fs = File.Create(path))
{
using Stream
target = opts.NonSeekable
? new NonSeekableWriteStream(fs)
: fs; // explicit type fixes compilation
WriteWithZipArchive(target, entries, opts.Level);
Console.WriteLine("[ZipArchive] Done");
}
if (opts.Verify) VerifyZip(path);
createdAny = true;
}
if (opts.UseSharpZip)
{
var path = Path.Combine(outDir, baseName + "-sharpzip.zip");
Console.WriteLine($"[SharpZipLib] Writing {path} ...");
using (var fs = File.Create(path))
{
using Stream
target = opts.NonSeekable
? new NonSeekableWriteStream(fs)
: fs; // explicit type fixes compilation
WriteWithSharpZip(target, entries, opts.Level);
Console.WriteLine("[SharpZipLib] Done");
}
if (opts.Verify) VerifyZip(path);
createdAny = true;
}
if (!createdAny)
{
Console.WriteLine("No writer selected; defaulting to both.");
var zipArchivePath = Path.Combine(outDir, baseName + "-ziparchive.zip");
var sharpZipPath = Path.Combine(outDir, baseName + "-sharpzip.zip");
using (var fs = File.Create(zipArchivePath))
{
using (Stream target = opts.NonSeekable ? new NonSeekableWriteStream(fs) : fs)
{
Console.WriteLine($"[ZipArchive] Writing {zipArchivePath} ...");
WriteWithZipArchive(target, entries, opts.Level);
Console.WriteLine("[ZipArchive] Done");
}
}
if (opts.Verify) VerifyZip(zipArchivePath);
using (var fs = File.Create(sharpZipPath))
{
using (Stream target = opts.NonSeekable ? new NonSeekableWriteStream(fs) : fs)
{
Console.WriteLine($"[SharpZipLib] Writing {sharpZipPath} ...");
WriteWithSharpZip(target, entries, opts.Level);
Console.WriteLine("[SharpZipLib] Done");
}
}
if (opts.Verify) VerifyZip(sharpZipPath);
}
Console.WriteLine("All done.");
return 0;
}
catch (Exception ex)
{
Console.Error.WriteLine(ex);
return 1;
}
}
private static void PrintHelp()
{
Console.WriteLine(@"ZipRepro <sourceDbFolder> <outputDir> [options]
--ziparchive Use System.IO.Compression ZipArchive
--sharpzip Use SharpZipLib ZipOutputStream
--level=<Optimal|Fastest|NoCompression>
--nonseekable Wrap output stream so ZipArchive uses data descriptors
--outname=<baseName> Output base file name (without extension)
--verify After writing, open the ZIP and iterate entries
");
}
private static void VerifyZip(string path)
{
try
{
using var zip = System.IO.Compression.ZipFile.OpenRead(path);
Console.WriteLine($"[Verify] Opened {path}, entries: {zip.Entries.Count}");
long total = 0;
foreach (var e in zip.Entries)
{
using var s = e.Open();
Span<byte> buf = stackalloc byte[8192];
int read = s.Read(buf);
total += read;
}
Console.WriteLine($"[Verify] Read a total of {total} bytes across entries");
}
catch (Exception ex)
{
Console.WriteLine($"[Verify] FAILED for {path}: {ex.GetType().Name}: {ex.Message}");
}
}
private static void WriteWithZipArchive(Stream output, List<BackupEntry> entries, CompressionLevel level)
{
using var archive = new ZipArchive(output, ZipArchiveMode.Create, leaveOpen: true, entryNameEncoding: Encoding.UTF8);
foreach (var e in entries)
{
Console.WriteLine($"[ZipArchive] + {e.ZipPath}");
var entry = archive.CreateEntry(e.ZipPath, level);
using var es = entry.Open();
using var fs = File.Open(e.SourcePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
fs.CopyTo(es);
}
}
private static void WriteWithSharpZip(Stream output, List<BackupEntry> entries, CompressionLevel level)
{
using var zipStream = new ZipOutputStream(output) { IsStreamOwner = false };
zipStream.SetLevel(MapSharpZipLevel(level));
foreach (var e in entries)
{
Console.WriteLine($"[SharpZipLib] + {e.ZipPath}");
var ze = new ZipEntry(e.ZipPath);
zipStream.PutNextEntry(ze);
using (var fs = File.Open(e.SourcePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fs.CopyTo(zipStream);
}
zipStream.CloseEntry();
}
zipStream.Finish();
}
private static int MapSharpZipLevel(CompressionLevel level) => level switch
{
CompressionLevel.NoCompression => 0,
CompressionLevel.Fastest => 1,
_ => 9,
};
private sealed class Options
{
public bool UseZipArchive { get; set; }
public bool UseSharpZip { get; set; }
public CompressionLevel Level { get; set; } = CompressionLevel.Optimal;
public bool NonSeekable { get; set; }
public string? OutName { get; set; }
public bool Verify { get; set; }
}
private static Options ParseOptions(IEnumerable<string> args)
{
var o = new Options();
foreach (var a in args)
{
if (a.Equals("--ziparchive", StringComparison.OrdinalIgnoreCase)) o.UseZipArchive = true;
else if (a.Equals("--sharpzip", StringComparison.OrdinalIgnoreCase)) o.UseSharpZip = true;
else if (a.StartsWith("--level=", StringComparison.OrdinalIgnoreCase))
{
var v = a.Substring("--level=".Length);
o.Level = v.Equals("NoCompression", StringComparison.OrdinalIgnoreCase) ? CompressionLevel.NoCompression :
v.Equals("Fastest", StringComparison.OrdinalIgnoreCase) ? CompressionLevel.Fastest :
CompressionLevel.Optimal;
}
else if (a.Equals("--nonseekable", StringComparison.OrdinalIgnoreCase)) o.NonSeekable = true;
else if (a.StartsWith("--outname=", StringComparison.OrdinalIgnoreCase)) o.OutName = a.Substring("--outname=".Length);
else if (a.Equals("--verify", StringComparison.OrdinalIgnoreCase)) o.Verify = true;
}
return o;
}
private static IEnumerable<BackupEntry> EnumerateBackupEntriesInRavenOrder(string sourceRoot)
{
// 1) Indexes (skip @*). Use alphabetical order for determinism.
var indexesDir = Path.Combine(sourceRoot, "Indexes");
if (Directory.Exists(indexesDir))
{
foreach (var indexDir in Directory.EnumerateDirectories(indexesDir).OrderBy(Path.GetFileName, StringComparer.OrdinalIgnoreCase))
{
var name = Path.GetFileName(indexDir);
if (name.StartsWith("@")) // skip @SharedJournals and any @*
continue;
foreach (var e in EnumerateEnv(indexDir, Path.Combine("Indexes", name)))
yield return e;
}
}
// 2) Documents (root env)
foreach (var e in EnumerateEnv(sourceRoot, Path.Combine("Documents")))
yield return e;
// 3) Configuration
var cfgDir = Path.Combine(sourceRoot, "Configuration");
if (Directory.Exists(cfgDir))
{
foreach (var e in EnumerateEnv(cfgDir, Path.Combine("Configuration")))
yield return e;
}
}
private static IEnumerable<BackupEntry> EnumerateEnv(string envDir, string zipBase)
{
// Include env root files (Temp is excluded by not traversing it here)
foreach (var f in Directory.EnumerateFiles(envDir))
{
var name = Path.GetFileName(f);
if (!ShouldIncludeFile(name))
continue;
yield return new BackupEntry(f, Path.Combine(zipBase, name).Replace('\\', '/'));
}
// Include journals
var journalsDir = Path.Combine(envDir, "Journals");
if (Directory.Exists(journalsDir))
{
foreach (var jf in Directory.EnumerateFiles(journalsDir, "*.journal"))
{
var name = Path.GetFileName(jf);
yield return new BackupEntry(jf, Path.Combine(zipBase, name).Replace('\\', '/'));
}
}
// Temp is always skipped per requirements
}
private static bool ShouldIncludeFile(string name)
{
if (name.Equals("Raven.voron", StringComparison.OrdinalIgnoreCase)) return true;
if (name.Equals("headers.one", StringComparison.OrdinalIgnoreCase)) return true;
if (name.Equals("headers.two", StringComparison.OrdinalIgnoreCase)) return true;
if (name.Equals("database.metadata", StringComparison.OrdinalIgnoreCase)) return true;
if (name.EndsWith(".journal", StringComparison.OrdinalIgnoreCase)) return true; // if any at env root
return false;
}
private readonly record struct BackupEntry(string SourcePath, string ZipPath);
private sealed class NonSeekableWriteStream : Stream
{
private readonly Stream _inner;
public NonSeekableWriteStream(Stream inner) => _inner = inner;
public override bool CanRead => false;
public override bool CanSeek => false;
public override bool CanWrite => true;
public override long Length => throw new NotSupportedException();
public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); }
public override void Flush() => _inner.Flush();
public override int Read(byte[] buffer, int offset, int count) => throw new NotSupportedException();
public override long Seek(long offset, SeekOrigin origin) => throw new NotSupportedException();
public override void SetLength(long value) => throw new NotSupportedException();
public override void Write(byte[] buffer, int offset, int count) => _inner.Write(buffer, offset, count);
#if NETSTANDARD2_1_OR_GREATER || NET5_0_OR_GREATER
public override void Write(ReadOnlySpan<byte> buffer)
{
var arr = ArrayPool<byte>.Shared.Rent(buffer.Length);
try
{
buffer.CopyTo(arr);
_inner.Write(arr, 0, buffer.Length);
}
finally
{
ArrayPool<byte>.Shared.Return(arr);
}
}
#endif
protected override void Dispose(bool disposing)
{
// do not own _inner
base.Dispose(disposing);
}
}
}
Repro steps
Command to reproduce, after unzipping the dataset to D:\raven-so-database:
ZipArchiveIssue.exe "D:\raven-so-database" "D:\temp" --ziparchive --level=NoCompression --verify
This creates D:\temp\raven-so-database-ziparchive.zip, then attempts to open it with ZipFile.OpenRead and read a small portion from each entry. On affected versions, it fails with:
System.IO.InvalidDataException: A local file header is corrupt.
Opening the same ZIP in 7‑Zip shows next to Documents\Raven.voron: Extra_ERROR Zip64_ERROR: UTF8
Packed Size also appears capped to 4GB for that entry, even though the file is ~30GB.
For completeness, writing with SharpZipLib succeeds:
ZipArchiveIssue.exe "D:\raven-so-database" "D:\temp" --sharpzip --level=NoCompression --verify
The resulting ZIP opens fine in both 7‑Zip and ZipFile.OpenRead.
Expected behavior
ZipArchive produces a valid ZIP64 archive that all standard tools can open.
Actual behavior
ZipFile.OpenRead(...) throws InvalidDataException: A local file header is corrupt.
- 7‑Zip shows
Extra_ERROR Zip64_ERROR: UTF8 on Documents\Raven.voron and reports an incorrect Packed Size (appears limited to 4GB) for that entry.
Regression?
No response
Known Workarounds
Use SharpZipLib
Configuration
- Reproduces on .NET 8 and .NET 10
- Windows 11
Other information
No response
Description
RavenDB snapshot backups produced with
ZipArchivecan be unrecoverable due to ZIP header corruption. The issue is that producing a snapshot backup which is ZIP archive withSystem.IO.Compression.ZipArchiveover a specific data set result in ZIP fails to open correctly:Extra_ERROR Zip64_ERROR: UTF8(for entry Documents\Raven.voron), and the Packed Size looks capped at 4GB.System.IO.Compression.ZipFile.OpenRead(...).Entries[i].Open()throwsSystem.IO.InvalidDataException: A local file header is corrupt.Writing the exact same dataset and order using SharpZipLib’s
ZipOutputStreamproduces a valid ZIP that both 7‑Zip andZipFile.OpenReadcan read.This started affecting us after introducing a feature that creates many per-index journal files that are hard links to the same underlying file content (so multiple distinct file paths share the exact same bytes on disk). Our dataset also includes a large 30GB file (Raven.voron). The combination seems to trigger a bug.
Reproduction Steps
Repro dataset
raven-so-database.zip(contains the on‑disk database folder): https://drive.google.com/file/d/1iCqKnzhu41umXik938umUee940MMPoWq/view?usp=sharing (10GB file, 42GB after unzipping)It includes:
Raven.voronfile (~30GB)Indexes/<IndexName>/Journals/*.journalfiles which were hard links pointing to the same physical journal files (identical SHA‑256 hashes across index folders)Repro app
Single‑file console app (targets net8.0 or net10.0). It copies files from the dataset into a ZIP using
ZipArchive, in the exact order RavenDB snapshot backup uses:Repro steps
Command to reproduce, after unzipping the dataset to
D:\raven-so-database:This creates
D:\temp\raven-so-database-ziparchive.zip, then attempts to open it withZipFile.OpenReadand read a small portion from each entry. On affected versions, it fails with:Opening the same ZIP in 7‑Zip shows next to
Documents\Raven.voron:Extra_ERROR Zip64_ERROR: UTF8Packed Size also appears capped to 4GB for that entry, even though the file is ~30GB.
For completeness, writing with SharpZipLib succeeds:
The resulting ZIP opens fine in both 7‑Zip and
ZipFile.OpenRead.Expected behavior
ZipArchiveproduces a valid ZIP64 archive that all standard tools can open.Actual behavior
ZipFile.OpenRead(...)throwsInvalidDataException: A local file header is corrupt.Extra_ERROR Zip64_ERROR: UTF8on Documents\Raven.voron and reports an incorrect Packed Size (appears limited to 4GB) for that entry.Regression?
No response
Known Workarounds
Use
SharpZipLibConfiguration
Other information
No response