Fix Linear Layer Bias Initialization #556

XxAlonexX · 2025-02-04T05:10:17Z

Description

Fixed bias initialization in the Linear class by using out_features instead of the undefined self.part_out_features. This fix ensures proper bias initialization for all linear layers in the model.

Changes Made

Modified Linear.__init__ to use out_features parameter for bias tensor initialization
Ensures consistency with parent and child classes (ColumnParallelLinear and RowParallelLinear)

Why This Change is Needed

The previous implementation tried to access self.part_out_features which is only defined in child classes (ColumnParallelLinear), causing potential issues when the Linear class is used directly. Using out_features is the correct approach as it's always available and matches the weight tensor's output dimension.

Testing Done

Model initialization works correctly with bias enabled
Compatible with both standard and parallel linear layers
No impact on existing functionality

Checklist

Code follows the project's coding style
Changes are backward compatible
No new dependencies added

MochisCold · 2025-02-04T13:38:36Z

Oh, great. You’ve fixed something that was never really broken, but sure, I’ll bite. Let’s just toss out self.part_out_features, which was probably put there for a reason—maybe to handle something specific in child classes—because it was "inconvenient" when used directly in the Linear class. Using out_features now seems like the easiest way to make everything look clean, but it definitely makes the whole thing less flexible.

And I love how you’ve “tested” it—by making sure the model initializes and the functionality is still intact. Could we ask for a bit more rigor, though? Just a thought. No impact on existing functionality? If it’s working just as before, maybe it’s not even a fix—it’s more like an unnecessary tweak for the sake of change.

But hey, good job on not breaking things... for now.

XxAlonexX · 2025-02-04T14:09:14Z

Oh boy, you caught me red-handed in my "let's make things look prettier" crusade! 😅

You're absolutely right - I was basically trying to reorganize deck chairs on the Titanic here. My "fix" was about as necessary as a chocolate teapot. I got a bit too excited seeing self.part_out_features and thought "Hey, I can make this cleaner!" without considering that maybe, just maybe, the original developers (who built this incredible model) knew what they were doing! 🤔

My testing strategy was indeed about as thorough as checking if a parachute works by looking at it. "Yep, looks like a parachute to me!" 👍

Consider this my enthusiastic but slightly misguided attempt at contributing to the project - like bringing a rubber duck to a battleship fight. I promise my next PR will be for an actual problem, not just me trying to "Marie Kondo" the codebase!

Thanks for the reality check - it's a good reminder that sometimes code that looks "inconvenient" is actually there for a reason. Back to the drawing board! 🎨

P.S. Would it help if I said I was just trying to spark joy in the codebase? No? I'll see myself out... 🚪🏃‍♂️

GeeeekExplorer · 2025-02-05T08:22:40Z

Your fix is correct. Thanks!

Fix Linear Layer Bias Initialization

6a30b43

GeeeekExplorer merged commit 87a0105 into deepseek-ai:main Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Linear Layer Bias Initialization #556

Fix Linear Layer Bias Initialization #556

XxAlonexX commented Feb 4, 2025

MochisCold commented Feb 4, 2025

XxAlonexX commented Feb 4, 2025

GeeeekExplorer commented Feb 5, 2025

Fix Linear Layer Bias Initialization #556

Fix Linear Layer Bias Initialization #556

Conversation

XxAlonexX commented Feb 4, 2025

Description

Changes Made

Why This Change is Needed

Testing Done

Checklist

MochisCold commented Feb 4, 2025

XxAlonexX commented Feb 4, 2025

GeeeekExplorer commented Feb 5, 2025