Describe the enhancement requested
Arrow Swift can be made significantly more usable and performance can be significantly improved through introducing some major changes.
The problems
Buffers can only be a single type
Every ArrowArray holds an ArrowData instance, which holds an array of ArrowBuffers. An ArrowBuffer allocates its own memory. This means that IPC cannot be zero-copy, because the buffer cannot be any other type, such as a view over another binary object like Data.
This is compounded by the fact that the ArrowBuffer exposes its memory via:
public let rawPointer: UnsafeMutableRawPointer
Pointer arithmetic is then being done within the subscript of arrays. This is actually duplicated in each array.
Second, having a public raw pointer like that is dangerous, e.g. it could easily cause use-after-free errors.
Allowing multiple buffer types, e.g. variable length type buffers and fixed type buffers can improve performance and memory safety.
ArrowTypes
Currently two reference types are used to describe an ArrowType. There is no reason I can think of for An arrow type to be mutable. Arrow types can be represented as a single value type, using an enum. This is likely improve performance and make the library far more usable, with enums offering much better ergonomics and safety, with exhaustiveness checking and simpler equality checks, for example.
Feasibility
This is definitely feasible because I have already done most of this in Swift Arrow:
https://github.com/willtemperley/swift-arrow/
This version includes support for all types that Arrow Swift covers, plus schema metadata, map types, binary views, binary with 64 bit offsets and nested types with 64 bit offsets.
It also passes Arrow Gold integration tests for these types.
I have compeltely rewritten the buffer infrastructure. Zero-copy IPC is available using views over memory-mapped Data objects.
I have replaced ArrowType with an enum which was adapted from from arrow-rs.
The challenge
Changing ArrowType to an enum would completely change the public API.
Replacing the buffer infrastructure means an almost complete rewrite of arrays and array builder.
Caveats
My version does not include C import / export and Arrow Flight support is pending.
Suggested way forward
I think this would simply need to be a clean-break fork, with a major version bump and a migration guide.
I'm happy to donate any code I've written in Swift Arrow back to Arrow Swift.
Describe the enhancement requested
Arrow Swift can be made significantly more usable and performance can be significantly improved through introducing some major changes.
The problems
Buffers can only be a single type
Every ArrowArray holds an ArrowData instance, which holds an array of ArrowBuffers. An ArrowBuffer allocates its own memory. This means that IPC cannot be zero-copy, because the buffer cannot be any other type, such as a view over another binary object like Data.
This is compounded by the fact that the ArrowBuffer exposes its memory via:
Pointer arithmetic is then being done within the subscript of arrays. This is actually duplicated in each array. Second, having a public raw pointer like that is dangerous, e.g. it could easily cause use-after-free errors.
Allowing multiple buffer types, e.g. variable length type buffers and fixed type buffers can improve performance and memory safety.
ArrowTypes
Currently two reference types are used to describe an ArrowType. There is no reason I can think of for An arrow type to be mutable. Arrow types can be represented as a single value type, using an enum. This is likely improve performance and make the library far more usable, with enums offering much better ergonomics and safety, with exhaustiveness checking and simpler equality checks, for example.
Feasibility
This is definitely feasible because I have already done most of this in Swift Arrow:
https://github.com/willtemperley/swift-arrow/
This version includes support for all types that Arrow Swift covers, plus schema metadata, map types, binary views, binary with 64 bit offsets and nested types with 64 bit offsets.
It also passes Arrow Gold integration tests for these types.
I have compeltely rewritten the buffer infrastructure. Zero-copy IPC is available using views over memory-mapped Data objects.
I have replaced ArrowType with an enum which was adapted from from arrow-rs.
The challenge
Changing ArrowType to an enum would completely change the public API.
Replacing the buffer infrastructure means an almost complete rewrite of arrays and array builder.
Caveats
My version does not include C import / export and Arrow Flight support is pending.
Suggested way forward
I think this would simply need to be a clean-break fork, with a major version bump and a migration guide.
I'm happy to donate any code I've written in Swift Arrow back to Arrow Swift.