Add installation instructions & llama.cpp as a submodule #1

HaroldBenoit · 2025-09-16T16:19:19Z

Hello, thank you very much for this toolkit. This is very useful for the community, and this PR aims to make a few improvements.

It adds an explicit requirements.txt & installation instructions in the README.md.
- The current dependencies specified in the README.md don't work out of box with pip or uv.
It adds explicitly llama.cpp as a submodule.
- This allows a clear way to obtain llama binaries such as llama-quantize necessary for scripts such as quant/gguf/run_quant.sh .
- This also fixes import errors (e.g. MistralTokenizerType is not importable) when running quant/gptq/pack_gptq_into_gguf.py, as the pypi version of gguf is not up-to-date, and llama.cpp implictly requires to point to the local upstream version stored in the gguf-py folder.

HaroldBenoit added 7 commits September 16, 2025 09:09

update with basic uv venv installation

72be961

add instructions for llama.cpp submodules

716118d

point llama-quantize binary to submodule by default

10876f7

fix gguf import errors by using local submodule gguf

c11419f

make gguf script runnable

9d544a3

modify gguf path for packing compressed tensors

e9db63e

add sentencepiece dep for llama gguf packing

4aca663

Provide feedback