Skip to content

Vector Extension Type#6964

Merged
connortsui20 merged 10 commits intodevelopfrom
ct/ext-vec
Mar 19, 2026
Merged

Vector Extension Type#6964
connortsui20 merged 10 commits intodevelopfrom
ct/ext-vec

Conversation

@connortsui20
Copy link
Contributor

@connortsui20 connortsui20 commented Mar 13, 2026

Summary

Tracking Issue: #6865

Adds a Vector extension type and a new L2Norm expression.

Additionally adds a AnyTensor type that can be matched on for any kind of tensor we want.

Right now the code assumes that everything is built on top of FixedSizeList, but in the future that might change.

Additionally make some touchups to the vortex-tensor crate in general.

API Changes

The new Vector and L2Norm types.

Testing

Some basic tests.

@connortsui20 connortsui20 added the changelog/feature A new feature label Mar 13, 2026
@connortsui20 connortsui20 force-pushed the ct/ext-vec branch 2 times, most recently from 33409b3 to 5a198e8 Compare March 16, 2026 15:01
@connortsui20 connortsui20 marked this pull request as ready for review March 16, 2026 15:01
@connortsui20 connortsui20 requested a review from gatesn March 16, 2026 15:01
Ok(args.to_vec())
}

// TODO(connor): This needs a precondition for the number of args it has, or all implementations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should already be a check using the arity of the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried backtracing through the code myself but I couldn't find where that is called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +148 to +150
fn is_fallible(&self, _options: &Self::Options) -> bool {
// Canonicalization of the storage array can fail.
true
Copy link
Contributor

@joseph-isaacs joseph-isaacs Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the meaning of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its asks if the operation can fail liked checked_add (overflow). Can this happen here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /// Returns whether this expression itself is fallible. Conservatively default to *true*.
    ///
    /// An expression is runtime fallible is there is an input set that causes the expression to
    /// panic or return an error, for example checked_add is fallible if there is overflow.
    ///
    /// Note: this is only applicable to expressions that pass type-checking
    /// [`ScalarFnVTable::return_dtype`].

In that case this doc comment needs to be more detailed since that was not clear. What is a better description here? How do we distinguish between an execution failure vs a logical failure when we get a VortexResult back regardless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@connortsui20 connortsui20 requested a review from gatesn March 17, 2026 11:39
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

✅ 1009 untouched benchmarks
⏩ 1515 skipped benchmarks1


Comparing ct/ext-vec (fe41f12) with develop (683ba3a)

Open in CodSpeed

Footnotes

  1. 1515 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20 connortsui20 force-pushed the ct/ext-vec branch 4 times, most recently from 615b78d to 2da0149 Compare March 18, 2026 19:58
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Comment on lines +19 to +20
// TODO(connor): This is just a placeholder for now.
type NativeValue<'a> = &'a ScalarValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly because we are blocked on #6717 and have to figure some things out on that first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality we don't actually care about this now, but once we start using these extension types in the compressor then we do care because we would want a more efficient representation of a vector when (for example) encoded as a ConstantArray

)?;

// Row 0: identical 1.0, row 1: orthogonal 0.0.
// Row 0: identical -> 1.0, row 1: orthogonal -> 0.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

@@ -0,0 +1,236 @@
// SPDX-License-Identifier: Apache-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a personal taste thing - but these hyper specialized utils just crate fragmentation in the codebase, its more layers and they only exist in this corner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure where else to put this though? We need it for both of the scalar fns in here (and will need it for more in the future)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just inline them? they seem to be mostly very short and easy to quickly read through, even if it introduces some code duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'm not sure I agree with this, especially since that would mean duplicating the FlatElements struct I introduced which feels wrong.

/// For [`FixedShapeTensor`], computes `dot(a, b) / (||a|| * ||b||)` over the flat backing buffer of
/// each tensor. The shape and permutation do not affect the result because cosine similarity only
/// depends on the element values, not their logical arrangement.
/// Computes `dot(a, b) / (||a|| * ||b||)` over the flat backing buffer of each tensor or vector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR, but I think we also want to do expr(array, scalar) for this sort of thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that is also part of why we want #6717

Copy link
Contributor

@AdamGS AdamGS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very reasonable to me, some pieces depend on future work, like Variant we probably want a standard way to express these sort of stability/maturity guarantees

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

fn id(&self) -> ExtId {
ExtId::new_ref("vortex.fixed_shape_tensor")
ExtId::new_ref("vortex.tensor.fixed_shape_tensor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain no one has written a vortex.fixed_shape_tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're pretty sure, this is a very recent addition

.map(|i| l2_norm_row(flat.row::<T>(i)))
.collect();

Ok(result.into_array())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we intend to eventually replace this with a call to BLAS? It seems like extract_flat_elements produces a BLAS compatible matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is BLAS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.netlib.org/blas/

Basic Linear Algebra Subprograms. It's the way to do fast linear algebra. There's also LAPACK which mostly deals with matrix transformations e.g. QR factorization.

There's many implementations of the BLAS interface. Perhaps the most interesting is GotoBLAS which this guy hand wrote in assembly. If you're on Intel processors you really want to link against the Intel Matrix Kernel Library (MKL). There's also OpenBLAS which is arch independent and cuBLAS which is for GPUs. Generally, one expects these libraries to be installed on the machine and you dynamically link against them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a standard rust crate that does this? If we can pull that in and it operates over flat elements then that is a trivial thing to add. I looked online and found a few things but am not familiar with this space

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@connortsui20 connortsui20 merged commit 8efe1dc into develop Mar 19, 2026
57 checks passed
@connortsui20 connortsui20 deleted the ct/ext-vec branch March 19, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants