RuntimeError: mmap can only be used with files saved with torch.save and _use_new_zipfile_serialization=True

### 🐛 Describe the bug

I followed the steps from https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md.

The installation steps are as follows:

git clone https://github.com/pytorch/executorch.git (2024/9/2 main branch)
cd executorch
git submodule sync
git submodule update --init
python3 -m venv .llama && source .llama/bin/activate
pip install torch torchvision torchaudio
pip install setuptools wheel
./install_requirements.sh --pybind xnnpack

then, I try Step 2: Prepare model and choose Option C: Download and export Llama 3 8B instruct model form[ hugging face](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)  to run llama3 with the following script

```
python3 -m examples.models.llama2.export_llama --checkpoint Meta-Llama-3-8B-Instruct/original/consolidated.00.pth -p Meta-Llama-3-8B-Instruct/original/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w  --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
```


I'm getting the following runtime error:
```
[INFO 2024-09-03 08:42:37,512 export_llama_lib.py:450] Applying quantizers: []
[INFO 2024-09-03 08:42:37,512 export_llama_lib.py:646] Loading model with checkpoint=Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, params=Meta-Llama-3-8B-Instruct/original/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
/Users/MyPath/executorch/examples/models/llama2/model.py:100: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 352, in export_llama
    builder = _export_llama(modelname, args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 474, in _export_llama
    _prepare_for_llama_export(modelname, args)
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 414, in _prepare_for_llama_export
    _load_llama_model(
  File "/Users/MyPath/executorch/examples/models/llama2/export_llama_lib.py", line 649, in _load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/examples/models/llama2/model.py", line 100, in __init__
    checkpoint = torch.load(checkpoint_path, map_location=device, mmap=True)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/MyPath/executorch/.llama/lib/python3.12/site-packages/torch/serialization.py", line 1284, in load
    raise RuntimeError(
RuntimeError: mmap can only be used with files saved with `torch.save(Meta-Llama-3-8B-Instruct/original/consolidated.00.pth, _use_new_zipfile_serialization=True), please torch.save your checkpoint with this option in order to use mmap.
```

### Versions

env:
M1 MacBook Pro
MacOS: 14.5
python: 3.12
torch:2.5.0.dev20240829 (https://pypi.org/simple, https://download.pytorch.org/whl/nightly/cpu)
cmake: 3.30.2-py3-none-macosx_11_0_universal2.macosx_10_10_x86_64.macosx_11_0_arm64.whl (47.9 MB)
wheel-0.44.0-py3-none-any.whl.metadata (2.3 kB)

cc @iseeyuan @mergennachin @cccclai @helunwencser @dvorjackz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: mmap can only be used with files saved with torch.save and _use_new_zipfile_serialization=True #5030

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: mmap can only be used with files saved with torch.save and _use_new_zipfile_serialization=True #5030

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions