Skip to content

[JS SDK] uploadFile buffers entire build context in memory; large contexts (~2.5 GB+) cause memory pressure / OOM #1301

@rzgrw

Description

@rzgrw

Bug Report

Summary

Follow-up to #1243. The fix for #1243 (buffer the tar archive so fetch
sets Content-Length and S3 doesn't 501 on chunked encoding) means
uploadFile in packages/js-sdk/src/template/buildApi.ts allocates the
entire build context as a single Buffer in memory before issuing the PUT.

For large contexts (we hit it at ~2.5 GB) this:

  • Allocates a multi-GB Buffer alongside the running tar process
  • Risks OOM on memory-constrained environments (CI runners, small cloud
    VMs, containers with low memory limits)
  • Causes long GC pauses on machines that do have the RAM

Current behavior

packages/js-sdk/src/template/buildApi.ts (lines ~131–141):

const { buffer } = await dynamicImport<
  typeof import('node:stream/consumers')
>('node:stream/consumers')
const uploadBody = await buffer(
  uploadStream as unknown as AsyncIterable<Buffer>
)

const res = await fetch(url, {
  method: 'PUT',
  body: uploadBody,
})

uploadStream (a tar.Pack) is fully drained into a single Buffer.

Reproduction

  1. Build context directory of ~2.5 GB or larger (e.g., a Dockerfile that
    bundles a large model or dataset).
  2. Run e2b template build (or call the SDK directly).
  3. Observe peak RSS during upload — the full tar is held in memory.
  4. On a ~2 GB-RAM runner, expect JavaScript heap out of memory.

The Python SDK has the same shape (tar_buffer.getvalue() → bytes), so
this is a cross-SDK parity concern, but the JS path is where it bites first.

Environment

  • e2b JS SDK: v2.18.0 and current main (commit at filing time)
  • Node.js: v22
  • Storage: any S3 / S3-compatible (independent of provider)

Why a streaming fix isn't trivial

The constraint from #1243 still applies: S3 presigned PUT requires
Content-Length and rejects Transfer-Encoding: chunked with 501. So
the body must have a known length before the request begins. Possible
directions, none obvious:

  • A. Tar to a temp file first, stat for size, then stream-PUT the
    file with explicit Content-Length. Constant memory, ~2.5 GB transient
    disk. Closest to what curl / aws-cli do.
  • B. Two-pass tar: first pass counts bytes, second pass streams.
    Avoids tmpfile but reads the source twice and risks non-deterministic
    tar output between passes (mtimes, etc.).
  • C. S3 multipart upload. Bigger change — requires a different
    presigned-URL flow on the server side, not just an SDK change.

Filing as the data point that #1243's fix didn't fully cover the
large-context case. Happy to discuss preferred direction before any PR.

Workaround

Run builds on a machine with roughly RAM ≥ 3 × build_context_size
(buffered Buffer + tar internals + Node heap headroom).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions