Long Prompt || .pte Generation || Mediatek

### 🐛 Describe the bug

I'm trying to run a long prompt (~800 tokens) on a MediaTek device, but the model outputs gibberish or keeps repeating a single word. I attempted to generate a .pte file with a higher cache size on a system with 32GB RAM, but the process gets killed due to high memory usage.

Is there a way to reduce the memory requirements so I can successfully generate a .pte file with cache_size: 1024 to test long-prompt performance? I want to run it for 3B parameter model.

`model=${1:-'llama3.2-3b'}
chunks=${2:-7}
tok=${3:-256}
cache=${4:-1024}
cal=${5:-llama3.txt}
pres=${6:-A16W4}
 
if [ $model = "llama3.2-3b" ]
then
	config_path=Llama-3.2-3B-Instruct/config.json
	pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3.2-1b" ]
then
	config_path=Llama-3.2-1B-Instruct/config.json
	pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3" ]
then
	config_path=llama3-8B-instruct/config.json
	pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3.2-3b-q" ]
then
	config_path=Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/config.json
	pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama2" ]
then
	config_path=llama2-7B-chat/config.json
	pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama2_short.json"
fi
 
if [ $cal = "None" ]
then
	data=""
else
	data="-d aot_utils/llm_utils/prompts/${cal}"
fi
 
echo "Model: $model"
echo "Config Path: $config_path"
echo "Num Chunks: $chunks"
echo "Num Tokens: $tok"
echo "Cache Size: $cache"
echo "Precision: $pres"
echo "Calibration Dataset: $cal"
echo "Preformatter: $pref"
 
python3 model_export_scripts/llama.py \
    models/llm_models/weights/${config_path} \
    -p $pres \
    --num_chunks $chunks \
	${data} \
	${pref} \
    -shapes ${tok}t${cache}c 1t${cache}c`

### Versions

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy==1.11.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.7.0
[conda] _anaconda_depends         2024.10             py312_mkl_0  
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py312h5eee18b_1  
[conda] mkl_fft                   1.3.10          py312h5eee18b_0  
[conda] mkl_random                1.2.7           py312h526ad5a_0  
[conda] numpy                     1.26.4          py312hc5e2394_0  
[conda] numpy-base                1.26.4          py312h0da6c21_0  
[conda] numpydoc                  1.7.0           py312h06a4308_0 

cc @iseeyuan @mergennachin @cccclai @helunwencser @jackzhxng @neuropilot-captain @cbilgin @byjlw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Long Prompt || .pte Generation || Mediatek #8725

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long Prompt || .pte Generation || Mediatek #8725

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions