Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Linear Layer Bias Initialization #556

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

XxAlonexX
Copy link

Description

Fixed bias initialization in the Linear class by using out_features instead of the undefined self.part_out_features. This fix ensures proper bias initialization for all linear layers in the model.

Changes Made

  • Modified Linear.__init__ to use out_features parameter for bias tensor initialization
  • Ensures consistency with parent and child classes (ColumnParallelLinear and RowParallelLinear)

Why This Change is Needed

The previous implementation tried to access self.part_out_features which is only defined in child classes (ColumnParallelLinear), causing potential issues when the Linear class is used directly. Using out_features is the correct approach as it's always available and matches the weight tensor's output dimension.

Testing Done

  • Model initialization works correctly with bias enabled
  • Compatible with both standard and parallel linear layers
  • No impact on existing functionality

Checklist

  • Code follows the project's coding style
  • Changes are backward compatible
  • No new dependencies added

@MochisCold
Copy link

Oh, great. You’ve fixed something that was never really broken, but sure, I’ll bite. Let’s just toss out self.part_out_features, which was probably put there for a reason—maybe to handle something specific in child classes—because it was "inconvenient" when used directly in the Linear class. Using out_features now seems like the easiest way to make everything look clean, but it definitely makes the whole thing less flexible.

And I love how you’ve “tested” it—by making sure the model initializes and the functionality is still intact. Could we ask for a bit more rigor, though? Just a thought. No impact on existing functionality? If it’s working just as before, maybe it’s not even a fix—it’s more like an unnecessary tweak for the sake of change.

But hey, good job on not breaking things... for now.

@XxAlonexX
Copy link
Author

Oh boy, you caught me red-handed in my "let's make things look prettier" crusade! 😅

You're absolutely right - I was basically trying to reorganize deck chairs on the Titanic here. My "fix" was about as necessary as a chocolate teapot. I got a bit too excited seeing self.part_out_features and thought "Hey, I can make this cleaner!" without considering that maybe, just maybe, the original developers (who built this incredible model) knew what they were doing! 🤔

My testing strategy was indeed about as thorough as checking if a parachute works by looking at it. "Yep, looks like a parachute to me!" 👍

Consider this my enthusiastic but slightly misguided attempt at contributing to the project - like bringing a rubber duck to a battleship fight. I promise my next PR will be for an actual problem, not just me trying to "Marie Kondo" the codebase!

Thanks for the reality check - it's a good reminder that sometimes code that looks "inconvenient" is actually there for a reason. Back to the drawing board! 🎨

P.S. Would it help if I said I was just trying to spark joy in the codebase? No? I'll see myself out... 🚪🏃‍♂️

@GeeeekExplorer
Copy link
Contributor

Your fix is correct. Thanks!

@GeeeekExplorer GeeeekExplorer merged commit 87a0105 into deepseek-ai:main Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants