Attempt to add the mllama
support
#11639
Draft
+1,630
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
This PR attempts to add the mllama support from the Ollama github into examples of this repository.
All code changes are mainly from the llama patch, operator patch, and mllama implement of the ollama repo.
Goals
clip
inllava
)llava
)unpad
operation supportingCurrent Status
There are still some issues for this implementation.
Model converter. The example model and projection are not on the huggingface.
Currently I use the
ollama
application to fetch the converted model for testing.The
n_vocab
(n_tokens
loaded from model) is mismatch with the tensor dimension.The
n_tokens
is128257
, the dimension ofLLM_TENSOR_OUTPUT
for example is128256
. It seems like something wrong in the converted model.As mentioned in
2.
, some assertion will fail when executing the mllama models.ggml_backend_tensor_get_async
andggml_backend_tensor_get
will fail in the tensor-read-out-of-bound checking.