[ENH] Efficient Attention Backend for TimeXer #1997

anasashb · 2025-11-30T13:59:10Z

Reference Issues/PRs

Fixes #1990.

What does this implement/fix? Explain your changes.

This Pull Request adds a use_efficient_attention boolean argument to the TimeXer model (v1 and v2 versions, both), which, if set to =True, switches to a more memory-efficient and faster attention implementation using torch.nn.functional_scaled_dot_product_attention() instead of the torch.einsum() solution inside the FullAttention class (v1 and v2 versions both).

The newly introduced argument is currently set to False to keep the new feature completely backwards compatible.

Additionally, there's a very minor bugfix in the PositionalEmbedding class (v1 and v2 versions both), where a bug carried over from tslib used to define:

pe.require_grad = False

In torch, the correct attribute for whether a tensor requires grad is called .requires_grad. This bug has also been fixed.

What should a reviewer concentrate their feedback on?

Reviewers should focus on the implementation of _einsum_attention() and _efficient attention() which are new private methods that def forward() of the FullAttention class calls to handle attention implementation.

I did not make any other changes to the code, but if it works for you I could also:

Remove some unused args such as tau, delta, factor scattered across the tslib code carried over here
Add type annotations to the PyTorch code from tslib

Or if you'd be OK with those changes too, I can also open a separate PR.

Did you add any tests for the change?

Yes, in both: tests/test_models/test_timxer.py and tests/test_models/test_timexer_v2.py. These include new assertions in initialization tests, as well as parameterization of the use_efficient_attention for integration tests.

Any other comments?

PR checklist

The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

anasashb · 2025-11-30T14:29:56Z

Some more notes:

The CI/CD pipe seems to partially fail but it looks like all of the errors are due to weight loading code for Temporal Fusion Transformer, not related to the feature in this PR
As I said in the PR description:

I did not make any other changes to the code, but if it works for you I could also:

Remove some unused args such as tau, delta, factor scattered across the tslib code carried over here

Add type annotations to the PyTorch code from tslib

If your team's ok with me addressing these, I could address these in this PR or open a new one, whatever works for your team's code review process

p.s. if anyone wants to benchmark the speed and memory consumption of the old and new attention backends, I can give you a script for that. I was not exactly sure if script such as that made sense to be included as part of the package itself

phoeenniixx · 2025-11-30T14:56:46Z

The CI/CD pipe seems to partially fail but it looks like all of the errors are due to weight loading code for Temporal Fusion Transformer, not related to the feature in this PR

Yes the issue is known although we still doesn't know the exact source (see #1998, and the discussion from the discord thread here)

If your team's ok with me addressing these, I could address these in this PR or open a new one, whatever works for your team's code review process

I'd prefer a new PR (stacked on this PR, or maybe after this PR is merged) to keep the "responsibilities" separate for both the PRs.
FYI @PranavBhatP @fkiraly @agobbifbk

fkiraly · 2025-11-30T15:35:07Z

if anyone wants to benchmark the speed and memory consumption of the old and new attention backends, I can give you a script for that. I was not exactly sure if script such as that made sense to be included as part of the package itself

hm, that feels extremely useful! Could you put that into utils, in a separate PR? We could use that for performance monitoring in the CI.

codecov · 2025-12-04T14:58:54Z

Codecov Report

❌ Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@aae13ba). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...h_forecasting/layers/_attention/_full_attention.py	88.88%	2 Missing ⚠️
pytorch_forecasting/models/timexer/sub_modules.py	89.47%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1997   +/-   ##
=======================================
  Coverage        ?   86.99%           
=======================================
  Files           ?      160           
  Lines           ?     9494           
  Branches        ?        0           
=======================================
  Hits            ?     8259           
  Misses          ?     1235           
  Partials        ?        0

Flag	Coverage Δ
cpu	`86.99% <89.74%> (?)`
pytest	`86.99% <89.74%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

phoeenniixx

Thanks a lot for this PR @anasashb ! This is really great
I have added some comments:

related to the docstrings - I think this is from the older part of the code that's why it is still not using numpydoc style. I would really appreciate if you could use numpydoc style docstrings here. We still need to update the docstrings from the whole codebase :)
Can you also add some fixtures in _timexer_pkg and _timexer_pkg_v2. We are moving from the standalone tests to a unified test framework and now only just adding test fixtures and some configs would work to test the whole models. You can see some examples in the fixtures already present in the above files and you can also look at any other model, all models have this pkg class now which is used to test these models. All you need to do is just update the get_base_test_params (in case of v1_timexer_pkg) and get_test_train_params (in case of v2 _timexer_pkg_v2) to have the fixtures to test the new attention mechanism.

phoeenniixx · 2025-12-05T15:18:49Z

pytorch_forecasting/layers/_attention/_full_attention.py

        attention_dropout (float): Dropout rate for attention scores.
-        output_attention (bool): Whether to output attention weights."""
+        output_attention (bool): Whether to output attention weights.
+        efficient_attention (bool): Whether to use torch's native efficient


Should it be use_efficient_attention here?
Also please explain what "efficient attention" means here and how is it different from einsum_attention.

Also please use numpydoc style docstrings. I think this part is from the older part of the code, that's why it's still not updated. Updating the style to numpydoc style would be greatly appreciated!

phoeenniixx · 2025-12-05T15:20:00Z

pytorch_forecasting/models/timexer/sub_modules.py

        attention_dropout (float): Dropout rate for attention scores.
-        output_attention (bool): Whether to output attention weights."""
+        output_attention (bool): Whether to output attention weights.
+        efficient_attention (bool): Whether to use torch's native efficient


same comment as above!

anasash-b added 10 commits November 24, 2025 22:23

Initial Implementation of Efficient Attn

afac669

bugfix

1dcc7a6

Add Efficient Attn Model Arg

2f78bc4

Add Arg Docstring

bda8a3a

Expand Init Test

35c7b60

Addition to Init Test

ba0794d

Parameterize Integration Test

492aa54

Add Ids to Parameters

96be1c7

Correct Order

03602ad

Add Attn Params to Model

f9441bf

anasashb marked this pull request as ready for review November 30, 2025 14:37

anasashb requested review from PranavBhatP, benHeid, fkiraly, fnhirwa, jdb78, phoeenniixx and yarnabrina as code owners November 30, 2025 14:37

anasashb mentioned this pull request Nov 30, 2025

[ENH] Sub-Optimal FullAttention Implementations Taken from TSLib #1990

Open

Merge branch 'main' into pr/1997

d8fefb6

phoeenniixx requested changes Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Efficient Attention Backend for TimeXer #1997

[ENH] Efficient Attention Backend for TimeXer #1997

anasashb commented Nov 30, 2025 •

edited

Loading

Uh oh!

anasashb commented Nov 30, 2025 •

edited

Loading

Uh oh!

phoeenniixx commented Nov 30, 2025

Uh oh!

fkiraly commented Nov 30, 2025

Uh oh!

codecov bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

phoeenniixx left a comment •

edited

Loading

Uh oh!

phoeenniixx Dec 5, 2025

Uh oh!

phoeenniixx Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ENH] Efficient Attention Backend for TimeXer #1997

Are you sure you want to change the base?

[ENH] Efficient Attention Backend for TimeXer #1997

Conversation

anasashb commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

Uh oh!

anasashb commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phoeenniixx commented Nov 30, 2025

Uh oh!

fkiraly commented Nov 30, 2025

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

phoeenniixx left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phoeenniixx Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

phoeenniixx Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anasashb commented Nov 30, 2025 •

edited

Loading

anasashb commented Nov 30, 2025 •

edited

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading

phoeenniixx left a comment •

edited

Loading