Skip to content

IPEX v2.1.1+cpu crashes PyTorch for model where v1.13.100+cpu worked #484

Open
@droberts195

Description

@droberts195

Describe the bug

We have a model called elser_model_2_linux-x86_64.pt. It can be downloaded from https://ml-models.elastic.co/elser_model_2_linux-x86_64.pt. As the name suggests it's been quantized for Intel x86_64 hardware.

When this model was used with PyTorch 1.13.1 and IPEX v1.13.100+cpu it worked.

When this model is used with PyTorch 2.1.1 and IPEX v2.1.0+cpu it causes an internal assertion to trip inside PyTorch.

The following Python script can reproduce the problem (assuming elser_model_2_linux-x86_64.pt has been downloaded into the current directory using the link above):

import torch
import intel_extension_for_pytorch

model = torch.jit.load("elser_model_2_linux-x86_64.pt")
model.eval()

input_ids = [101, 1996, 2143, 2001, 2307, 999, 999, 102]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1]
token_type_ids = [0, 0, 0, 0, 0, 0, 0, 0]
position_ids = [0, 1, 2, 3, 4, 5, 6, 7]
results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]), torch.tensor([token_type_ids]), torch.tensor([position_ids]))
print(results)

input_ids = [101, 1996, 3185, 2001, 12476, 999, 999, 102]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1]
token_type_ids = [0, 0, 0, 0, 0, 0, 0, 0]
position_ids = [0, 1, 2, 3, 4, 5, 6, 7]
results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]), torch.tensor([token_type_ids]), torch.tensor([position_ids]))
print(results)

Output:

$ python3.10 ./test_elser.py 
tensor([[0., 0., 0.,  ..., 0., 0., 0.]])
Traceback (most recent call last):
  File "/home/ubuntu/./test_elser.py", line 19, in <module>
    results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]), torch.tensor([token_type_ids]), torch.tensor([position_ids]))
  File "/usr/local/gcc103/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/gcc103/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: outputs_.size() == 1 INTERNAL ASSERT FAILED at "/root/anaconda3/envs/pytorch_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/jit/ir/ir.h":510, please report a bug to PyTorch. 

Output if the import intel_extension_for_pytorch line is deleted:

$ python3.10 ./test_elser.py 
tensor([[0., 0., 0.,  ..., 0., 0., 0.]])
tensor([[0., 0., 0.,  ..., 0., 0., 0.]])

So something in IPEX is causing the assertion failure.

Versions

Collecting environment information...
PyTorch version: 2.1.1+cu121
PyTorch CXX11 ABI: No
IPEX version: 2.1.0+cpu
IPEX commit: 94f4320
Build type: Release

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (GCC) 10.3.0
Clang version: N/A
IGC version: N/A
CMake version: version 3.27.9
Libc version: glibc-2.27

Python version: 3.10.9 (main, Dec 11 2023, 11:18:08) [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-1103-aws-x86_64-with-glibc2.27
Is XPU available: False
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration: 

Intel OpenCL ICD version: N/A
Level Zero version: N/A

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             3402.319
BogoMIPS:            5999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==2.1.0
[pip3] numpy==1.26.2
[pip3] torch==2.1.1
[conda] N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    CPUCPU specific issuesCrashExecution crashes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions