Skip to content

Modular Rust Arrow libraries in wasm-bindgen #8

@kylebarron

Description

@kylebarron

Problem statement

The biggest hurdle with WebAssembly in the browser is that multiple Wasm modules can't share the same memory space. This means that having e.g. parquet-wasm and geoarrow-wasm as two separate NPM modules is annoying! You have to use parquet-wasm to load parquet into Arrow in Wasm... but then copy the data to JS, and then copy it into the next wasm module to do more processing with it! This is slow, memory intensive, and not user friendly.

Solution

In https://github.com/domoritz/arrow-wasm, Dominik's goal appeared to be to see if Arrow in rust/wasm would be faster than Arrow in JS. But since working with raw buffers is pretty fast in JS, it's not surprising that Wasm overhead would outweigh any other speedups.

I think the potential of arrow-wasm instead is in being a foundational library for other wasm-bindgen libraries.

So I see various potential libraries:

  • parquet-wasm: depends on arrow-wasm directly; used by consumers who only want to parse Parquet and get it into JS. PR is here: Depend on arrow-wasm parquet-wasm#292
  • geoparquet-wasm: depends on arrow-wasm and geoarrow-rs; used by consumers who only want to parse GeoParquet to GeoArrow and get it into JS. PR is here: Use arrow-wasm geoparquet-wasm#6
  • geoarrow-wasm-slim: depends on arrow-wasm, doesn't include geoparquet io to keep a small bundle size. Used by consumers who somehow otherwise have their data as geoarrow in the browser.
  • geoarrow-wasm-full: includes extras like re-exporting geoparquet-wasm. Used by consumers who want to fetch geoparquet to geoarrow but also do some geospatial operations.

Other libraries for other formats might make sense to add in the future. like geoarrow-flatgeobuf, which uses rust to parse flatgeobuf into geoarrow. Etc.

Drawbacks

  • It would be nice if it were possible to add wasm-bindgen methods onto existing structs from another crate, but that doesn't appear to be possible 🥲 . This means that any crates other than arrow-wasm may only use structs defined in arrow-wasm in a functional manner.
  • There will be a proliferation of feature flags. E.g. geoarrow-wasm might have feature flags for each compression in parquet-wasm?

cc @H-Plus-Time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions