(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

ysjprojects · 2025-04-13T15:22:28Z

follow up from #1945

More alignment with the huggingface implementation, allowing custom values for q_lora_rank, v_dim, etc.
Cleaned up some lines added for debugging purposes.

Also, the errors in #1945 was simply because pythia simply did not have the necessary parameters for MLA.

for more information, see https://pre-commit.ci

…head_dim, etc.

for more information, see https://pre-commit.ci

Borda

let's add a test :)

ali-alshaar7

Looks good overall. +1 on the test though, and would be nice if we could share more with the base attention implementation instead of duplicating things. Maybe for a followup. Thank you.

for more information, see https://pre-commit.ci

Borda · 2025-04-23T08:12:28Z

@t-vi @lantiga mind have look, pls :)

for more information, see https://pre-commit.ci

ysjprojects · 2025-05-16T06:53:40Z

@t-vi @lantiga @Borda

DeepseekV3 architecture WIP.

Should we move DeepseekV3MoE and MultiHeadLatentAttention out of model.py into its own file since the architecture is quite unique so it's unlikely that a future model would implement it again?

Borda · 2025-08-14T07:19:48Z

@ysjprojects are you going to work on this one now? 🦩

ysjprojects · 2025-08-17T05:08:11Z

@ysjprojects are you going to work on this one now? 🦩

Definitely, but I will first work on much lower hanging fruits that are long overdue (Qwen3 Coder etc.)

ysjprojects · 2025-08-28T13:10:54Z

Closing this as I plan to break it into multiple components across PRs.

New PR on Multi-Head Latent Attention implementation: #2113

simoneangarano and others added 17 commits February 24, 2025 18:23

v2

249f722

added link to results

8350ac2

uodated README_MLA

2a599da

Updated README_MLA.md

9ab7ed8

Update README_MLA.md

7ce13ff

add more comments and visual representation

f46a2b1

Merge branch 'main' of https://github.com/simoneangarano/litgpt

c3eef3f

Merge branch 'main' into main

98579a0

Merge branch 'main' into main

a7fb896

Merge branch 'main' into main

8b030ec

[pre-commit.ci] auto fixes from pre-commit.com hooks

48fb11d

for more information, see https://pre-commit.ci

Merge branch 'main' into main

5dc3985

typo

07b0538

Merge branch 'main' into main

27d3d40

MLA: modified to support specifying custom values for q_lora_rank, v_…

18af658

…head_dim, etc.

clean up

6cf4282

clean up

15727c6

ysjprojects requested review from lantiga, t-vi and Borda as code owners April 13, 2025 15:22

pre-commit-ci bot and others added 2 commits April 13, 2025 15:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

47bd94e

for more information, see https://pre-commit.ci

Merge branch 'main' into pr-feature-mla

ebeb67f

Borda approved these changes Apr 14, 2025

View reviewed changes

Borda reviewed Apr 14, 2025

View reviewed changes

ali-alshaar7 approved these changes Apr 15, 2025

View reviewed changes

ysjprojects and others added 2 commits April 23, 2025 03:37

major change ref

43187c2

[pre-commit.ci] auto fixes from pre-commit.com hooks

56e62ee

for more information, see https://pre-commit.ci

ysjprojects and others added 2 commits May 15, 2025 18:56

Merge branch 'main' into pr-feature-mla

9f35b9c

feat: deepseekv3 architecture

ba55cf1

shijie.yu and others added 2 commits May 16, 2025 06:47

deepseekv3

7e3ea78

[pre-commit.ci] auto fixes from pre-commit.com hooks

dcc89de

for more information, see https://pre-commit.ci

ysjprojects changed the title ~~Multi-head Latent Attention fixes~~ DeepseekV3 (and Multi-Head Latent Attention) May 16, 2025

ysjprojects changed the title ~~DeepseekV3 (and Multi-Head Latent Attention)~~ (WIP) DeepseekV3 (and Multi-Head Latent Attention) May 16, 2025

ysjprojects marked this pull request as draft May 21, 2025 12:52

Borda force-pushed the main branch from 3af971e to c078a26 Compare August 14, 2025 11:16

ysjprojects closed this Aug 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

Uh oh!

ysjprojects commented Apr 13, 2025 •

edited

Loading

Uh oh!

Borda left a comment

Uh oh!

ali-alshaar7 left a comment

Uh oh!

Borda commented Apr 23, 2025

Uh oh!

ysjprojects commented May 16, 2025

Uh oh!

Borda commented Aug 14, 2025

Uh oh!

ysjprojects commented Aug 17, 2025

Uh oh!

ysjprojects commented Aug 28, 2025

Uh oh!

Uh oh!

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

Uh oh!

Conversation

ysjprojects commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

ali-alshaar7 left a comment

Choose a reason for hiding this comment

Uh oh!

Borda commented Apr 23, 2025

Uh oh!

ysjprojects commented May 16, 2025

Uh oh!

Borda commented Aug 14, 2025

Uh oh!

ysjprojects commented Aug 17, 2025

Uh oh!

ysjprojects commented Aug 28, 2025

Uh oh!

Uh oh!

ysjprojects commented Apr 13, 2025 •

edited

Loading