Skip to content

Long Prompt || .pte Generation || Mediatek #8725

Open
@shreshth-tru

Description

@shreshth-tru

🐛 Describe the bug

I'm trying to run a long prompt (~800 tokens) on a MediaTek device, but the model outputs gibberish or keeps repeating a single word. I attempted to generate a .pte file with a higher cache size on a system with 32GB RAM, but the process gets killed due to high memory usage.

Is there a way to reduce the memory requirements so I can successfully generate a .pte file with cache_size: 1024 to test long-prompt performance? I want to run it for 3B parameter model.

`model=${1:-'llama3.2-3b'}
chunks=${2:-7}
tok=${3:-256}
cache=${4:-1024}
cal=${5:-llama3.txt}
pres=${6:-A16W4}

if [ $model = "llama3.2-3b" ]
then
config_path=Llama-3.2-3B-Instruct/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3.2-1b" ]
then
config_path=Llama-3.2-1B-Instruct/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3" ]
then
config_path=llama3-8B-instruct/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama3.2-3b-q" ]
then
config_path=Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama3.json"
elif [ $model = "llama2" ]
then
config_path=llama2-7B-chat/config.json
pref="--preformatter aot_utils/llm_utils/preformatter_templates/llama2_short.json"
fi

if [ $cal = "None" ]
then
data=""
else
data="-d aot_utils/llm_utils/prompts/${cal}"
fi

echo "Model: $model"
echo "Config Path: $config_path"
echo "Num Chunks: $chunks"
echo "Num Tokens: $tok"
echo "Cache Size: $cache"
echo "Precision: $pres"
echo "Calibration Dataset: $cal"
echo "Preformatter: $pref"

python3 model_export_scripts/llama.py
models/llm_models/weights/${config_path}
-p $pres
--num_chunks $chunks
${data}
${pref}
-shapes ${tok}t${cache}c 1t${cache}c`

Versions

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy==1.11.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.7.0
[conda] _anaconda_depends 2024.10 py312_mkl_0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py312h5eee18b_1
[conda] mkl_fft 1.3.10 py312h5eee18b_0
[conda] mkl_random 1.2.7 py312h526ad5a_0
[conda] numpy 1.26.4 py312hc5e2394_0
[conda] numpy-base 1.26.4 py312h0da6c21_0
[conda] numpydoc 1.7.0 py312h06a4308_0

cc @iseeyuan @mergennachin @cccclai @helunwencser @jackzhxng @neuropilot-captain @cbilgin @byjlw

Metadata

Metadata

Labels

module: llm/evaluationIssues related to LLM perplexity, accuracy, etc.module: mediatekDelegate to MediaTek backendmodule: user experienceIssues related to reducing friction for userspartner: mediatekIssues related to the Mediatek delegatetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

To triage

Status

To triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions