Skip to content

Conversation

NiccoloSacchi
Copy link

@NiccoloSacchi NiccoloSacchi commented Sep 15, 2025

I believe this dimension should be the same in V and K: shouldn't the V matrix contain the value embedding for each key?

I believe this dimension should be the same in V and K: the V matrix must contain the value embedding for each key.
@marcelroed
Copy link
Member

Yep, good catch, not sure how that slipped through. Will fix this on our end and update this repo before I close the issue.

K: Float[Tensor, " ... keys d_k"],
V: Float[Tensor, " ... values d_v"],
V: Float[Tensor, " ... keys d_v"],
mask: Bool[Tensor, " ... queries keys"] | None = None,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't mask be of size "queries keys" instead of "... queries keys"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The masks can differ for each batch element, so ... is correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I considered that but thought not really necessary (thought either all batch samples attend to everything or all batch samples only attend to past tokens). Anyway, thanks for confirming! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants