Skip to content

Commit 999f144

Browse files
committed
add refact llama.cpp tutorial
1 parent 48305c3 commit 999f144

File tree

1 file changed

+50
-0
lines changed

1 file changed

+50
-0
lines changed

inference.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Refact Inference
2+
3+
## llama.cpp (ggml)
4+
We have integrated Refact into llama.cpp for efficient inference which can support Intel, Apple Chip and Nvidia hardwares. Please read through [llama.cpp](https://github.com/ggerganov/llama.cpp) to understand the design firstly.
5+
6+
### Setup
7+
Change the repo to `https://github.com/ggerganov/llama.cpp` after [refact PR](https://github.com/ggerganov/llama.cpp/pull/3329) is officially merged. Please play with this forked one firstly on efficient inference.
8+
9+
```shell
10+
git clone https://github.com/ds5t5/llama.cpp.git
11+
cd llama.cpp
12+
git checkout -b add.refact origin/add.refact
13+
```
14+
15+
### Download the huggingface Refact model
16+
Run the below script or manually download the model and tokenizer to the local path.
17+
```shell
18+
pip3 install transformers torch accelerate
19+
```
20+
```python
21+
from transformers import AutoModelForCausalLM, AutoTokenizer
22+
23+
checkpoint = "smallcloudai/Refact-1_6B-fim"
24+
25+
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
26+
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True, low_cpu_mem_usage=True)
27+
28+
model.save_pretrained("./Refact-1_6B-fim")
29+
tokenizer.save_pretrained("./Refact-1_6B-fim")
30+
```
31+
32+
### Convert the model to gguf
33+
Please use python3.8+ environment.
34+
```shell
35+
pip3 install transformers torch sentencepiece
36+
cd gguf-py && pip install -e . && cd ..
37+
# use 0 at the end for fp32, 1 for fp16
38+
python3 convert-refact-hf-to-gguf.py ./Refact-1_6B-fim 1
39+
```
40+
41+
### Run the process
42+
Find more advanced features in llama.cpp for inference parameters like quantization and sampling.
43+
44+
```shell
45+
./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiple two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
46+
```
47+
48+
49+
### Known Issues
50+
- special tokens like `<fim_middle>` won't work as expected to be tokenized as one id in llama.cpp main binary examples. The community is adding a [fix](https://github.com/ggerganov/llama.cpp/issues/2820) to support special tokens.

0 commit comments

Comments
 (0)