Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to add the mllama support #11639

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

q82419
Copy link

@q82419 q82419 commented Feb 4, 2025

Motivation

This PR attempts to add the mllama support from the Ollama github into examples of this repository.

All code changes are mainly from the llama patch, operator patch, and mllama implement of the ollama repo.

Goals

  • Mllama implementation (similar to clip in llava)
  • Model converter of llama-3.2-vision to mllama
  • Full mllama example and document (such as the example of llava)
  • unpad operation supporting
  • Mllama model build and load in llama.cpp

Current Status

There are still some issues for this implementation.

  1. Model converter. The example model and projection are not on the huggingface.

    Currently I use the ollama application to fetch the converted model for testing.

  2. The n_vocab (n_tokens loaded from model) is mismatch with the tensor dimension.

    The n_tokens is 128257, the dimension of LLM_TENSOR_OUTPUT for example is 128256. It seems like something wrong in the converted model.

  3. As mentioned in 2., some assertion will fail when executing the mllama models.

    ggml_backend_tensor_get_async and ggml_backend_tensor_get will fail in the tensor-read-out-of-bound checking.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs examples ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Feb 4, 2025
@danbev
Copy link
Collaborator

danbev commented Feb 4, 2025

Thank you for the PR!

There is currently work in progress to introduce a new vision api, and along side this work there has been work on supporting mllama (Llama 3.2 Vision Instruct). Regarding the vocab issue we've had a disussion about this matter which might be of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants