-
Notifications
You must be signed in to change notification settings - Fork 289
WIP [DeepSeek R1] Add DeepSeekV3 Base + Weight Conversion #2171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DavidLandup0
wants to merge
20
commits into
keras-team:master
Choose a base branch
from
DavidLandup0:deepseek-r1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
e275e05
Initial commit
DavidLandup0 e26a0c0
full port
DavidLandup0 f0f5c55
Add more complete weight conversion script
DavidLandup0 ed0bed9
attempt to convert to functional subclassing model
DavidLandup0 e8fed54
Small fixes
DavidLandup0 00554bc
Finish functional subclassing model port
DavidLandup0 14882a7
use full args
DavidLandup0 5b91b17
Fix top_k usage
DavidLandup0 040e833
Move model args to config files
DavidLandup0 1c26539
Remove model args dataclasses
DavidLandup0 ffc7594
Add note and make test config match embedding dims of full config
DavidLandup0 ca96974
fix path
DavidLandup0 5c832f2
Remove unnecessary configs, add tokenizer class
DavidLandup0 2680a94
Add causallm class and causallmpreprocessor
DavidLandup0 9f117d3
Add tokenizer to conversion script
DavidLandup0 806b1b0
Add cpu() before numpy()
DavidLandup0 776d4a7
Run api_gen.sh
DavidLandup0 3418a12
Switch to full model parts
DavidLandup0 0cf6058
Remove unnecessary token copy
DavidLandup0 550f677
Fix 0 prefix
DavidLandup0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase the code to latest changes in Keras Hub in master and generate API again, so that it will show only the changes from DeepSeek