Skip to content

YaRN#445

Open
rrutmann wants to merge 2 commits into
mainfrom
yarn_hf
Open

YaRN#445
rrutmann wants to merge 2 commits into
mainfrom
yarn_hf

Conversation

@rrutmann
Copy link
Copy Markdown
Collaborator

What does this PR do?

This PR adds YaRN support to rotary position embeddings in the GPT-2 attention path.

General Changes

  • Implemented YaRN parameterization in rotary embeddings in gpt2_model.py
  • Added/updated YaRN configuration in config_lorem_ipsum_long_fsdp2_yarn.yaml
  • Refactored and strengthened rotary tests in test_rotary_qkv_transform.py

Breaking Changes

  • ..

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

rrutmann and others added 2 commits May 11, 2026 12:49
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant