Description
🐛 Describe the bug
I'm trying to run a long prompt (~800 tokens) on a MediaTek device, but the model outputs gibberish or keeps repeating a single word. I attempted to generate a .pte file with a higher cache size on a system with 32GB RAM, but the process gets killed due to high memory usage.
Is there a way to reduce the memory requirements so I can successfully generate a .pte file with cache_size: 1024 to test long-prompt performance? I want to run it for 3B parameter model.
`model=${1:-'llama3.2-3b'}
chunks=${2:-7}
tok=${3:-256}
cache=${4:-1024}
cal=${5:-llama3.txt}
pres=${6:-A16W4}
if [ $model = "llama3.2-3b" ]
then
config_path=Llama-3.2-3B-Instruct/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3.2-1b" ]
then
config_path=Llama-3.2-1B-Instruct/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3" ]
then
config_path=llama3-8B-instruct/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3.2-3b-q" ]
then
config_path=Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama2" ]
then
config_path=llama2-7B-chat/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama2_short.json"
fi
if [ $cal = "None" ]
then
data=""
else
data="-d aot_utils/llm_utils/prompts/${cal}"
fi
echo "Model: $model"
echo "Config Path: $config_path"
echo "Num Chunks: $chunks"
echo "Num Tokens: $tok"
echo "Cache Size: $cache"
echo "Precision: $pres"
echo "Calibration Dataset: $cal"
echo "Preformatter: $pref"
python3 model_export_scripts/llama.py
models/llm_models/weights/${config_path}
-p $pres
--num_chunks $chunks
${data}
${pref}
-shapes ${tok}t${cache}c 1t${cache}c`
Versions
Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy==1.11.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.7.0
[conda] _anaconda_depends 2024.10 py312_mkl_0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py312h5eee18b_1
[conda] mkl_fft 1.3.10 py312h5eee18b_0
[conda] mkl_random 1.2.7 py312h526ad5a_0
[conda] numpy 1.26.4 py312hc5e2394_0
[conda] numpy-base 1.26.4 py312h0da6c21_0
[conda] numpydoc 1.7.0 py312h06a4308_0
cc @iseeyuan @mergennachin @cccclai @helunwencser @jackzhxng @neuropilot-captain @cbilgin @byjlw
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status