-
Couldn't load subscription status.
- Fork 327
add Mamba.swift #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add Mamba.swift #401
Conversation
| public func loraLinearLayers() -> MLXLMCommon.LoRALinearLayers { | ||
| // TODO ??? | ||
| return [] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what to do for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the q and v projection layers from attention:
public func loraLinearLayers() -> LoRALinearLayers {
model.layers.map { ($0.attention, ["q_proj", "v_proj"]) }
}but this doesn't seem to have an Attention layer. It works with any linear layers, so perhaps the x_proj and dt_proj layers in MambaBlock?
Otherwise maybe just remove the method. Also, FWIW this type would need to conform to LoRAModel for this to work.
|
|
||
| import Foundation | ||
| import MLX | ||
| import MLXFast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is included in MLX now -- you can remove this import.
|
|
||
| // port of https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/models/mamba.py | ||
|
|
||
| struct StringKey: CodingKey, ExpressibleByStringLiteral { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if you have to keep it, please make it private
| try container | ||
| .decodeIfPresent(Int.self, forKey: .hiddenSize) | ||
| ?? fallback | ||
| .decode(Int.self, forKey: "d_model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do this with:
enum CodingKeys: String, CodingKey {
case modelType = "model_type"
case vocabSize = "vocab_size"
case hiddenSize = "hidden_size"
case dModel = "d_model"then:
hiddenSize = try container.decodeIfPresent(Int.self, forKey: .hiddenSize) ?? container.decode(Int.self, forKey: .dModel)#316 will also provide a good solution to this, but it isn't merged yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just wait for #316 and update the PR using the new macro.
No description provided.