Skip to content

Proposal to re-architect Arrow Swift #154

@willtemperley

Description

@willtemperley

Describe the enhancement requested

Arrow Swift can be made significantly more usable and performance can be significantly improved through introducing some major changes.

The problems

Buffers can only be a single type

Every ArrowArray holds an ArrowData instance, which holds an array of ArrowBuffers. An ArrowBuffer allocates its own memory. This means that IPC cannot be zero-copy, because the buffer cannot be any other type, such as a view over another binary object like Data.

This is compounded by the fact that the ArrowBuffer exposes its memory via:



public let rawPointer: UnsafeMutableRawPointer



Pointer arithmetic is then being done within the subscript of arrays. This is actually duplicated in each array. 
Second, having a public raw pointer like that is dangerous, e.g. it could easily cause use-after-free errors.


Allowing multiple buffer types, e.g. variable length type buffers and fixed type buffers can improve performance and memory safety.

ArrowTypes

Currently two reference types are used to describe an ArrowType. There is no reason I can think of for An arrow type to be mutable. Arrow types can be represented as a single value type, using an enum. This is likely improve performance and make the library far more usable, with enums offering much better ergonomics and safety, with exhaustiveness checking and simpler equality checks, for example.

Feasibility

This is definitely feasible because I have already done most of this in Swift Arrow:
https://github.com/willtemperley/swift-arrow/

This version includes support for all types that Arrow Swift covers, plus schema metadata, map types, binary views, binary with 64 bit offsets and nested types with 64 bit offsets.

It also passes Arrow Gold integration tests for these types.

I have compeltely rewritten the buffer infrastructure. Zero-copy IPC is available using views over memory-mapped Data objects.

I have replaced ArrowType with an enum which was adapted from from arrow-rs.

The challenge

Changing ArrowType to an enum would completely change the public API.
Replacing the buffer infrastructure means an almost complete rewrite of arrays and array builder.

Caveats

My version does not include C import / export and Arrow Flight support is pending.

Suggested way forward

I think this would simply need to be a clean-break fork, with a major version bump and a migration guide.
I'm happy to donate any code I've written in Swift Arrow back to Arrow Swift.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions