Description
🐛 Describe the bug
I followed the steps from https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md.
The installation steps are as follows:
git clone https://github.com/pytorch/executorch.git (2024/9/2 main branch)
cd executorch
git submodule sync
git submodule update --init
python3 -m venv .llama && source .llama/bin/activate
pip install torch torchvision torchaudio
pip install setuptools wheel
./install_requirements.sh --pybind xnnpack
then, I try Step 2: Prepare model and choose Option C: Download and export Llama 3 8B instruct model form hugging face to run llama3 with the following script
python3 -m examples.models.llama2.export_llama --checkpoint Meta-Llama-3-8B-Instruct/original/consolidated.00.pth -p Meta-Llama-3-8B-Instruct/original/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
I'm getting the following runtime error:
[INFO 2024-09-03 08:42:37,512 export_llama_lib.py:450] Applying quantizers: []
[INFO 2024-09-03 08:42:37,512 export_llama_lib.py:646] Loading model with checkpoint=Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
/Users/MyPath/executorch/examples/models/llama2/model.py:100: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
main() # pragma: no cover
^^^^^^
File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 26, in main
export_llama(modelname, args)
File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 352, in export_llama
builder = _export_llama(modelname, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 474, in _export_llama
_prepare_for_llama_export(modelname, args)
File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 414, in _prepare_for_llama_export
_load_llama_model(
File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 649, in _load_llama_model
model, example_inputs, _ = EagerModelFactory.create_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/MyPath/executorch/examples/models/model_factory.py", line 44, in create_model
model = model_class(**kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/MyPath/executorch/examples/models/llama2/model.py", line 100, in __init__
checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/MyPath/executorch/.llama/lib/python3.12/site-packages/torch/serialization.py", line 1284, in load
raise RuntimeError(
RuntimeError: mmap can only be used with files saved with `torch.save(Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, _use_new_zipfile_serialization=True), please torch.save your checkpoint with this option in order to use mmap.
Versions
env:
M1 MacBook Pro
MacOS: 14.5
python: 3.12
torch:2.5.0.dev20240829 (https://pypi.org/simple, https://download.pytorch.org/whl/nightly/cpu)
cmake: 3.30.2-py3-none-macosx_11_0_universal2.macosx_10_10_x86_64.macosx_11_0_arm64.whl (47.9 MB)
wheel-0.44.0-py3-none-any.whl.metadata (2.3 kB)
cc @iseeyuan @mergennachin @cccclai @helunwencser @dvorjackz