IPEX v2.1.1+cpu crashes PyTorch for model where v1.13.100+cpu worked

### Describe the bug

We have a model called `elser_model_2_linux-x86_64.pt`. It can be downloaded from https://ml-models.elastic.co/elser_model_2_linux-x86_64.pt. As the name suggests it's been quantized for Intel x86_64 hardware.

When this model was used with PyTorch 1.13.1 and IPEX v1.13.100+cpu it worked.

When this model is used with PyTorch 2.1.1 and IPEX v2.1.0+cpu it causes an internal assertion to trip inside PyTorch.

The following Python script can reproduce the problem (assuming `elser_model_2_linux-x86_64.pt` has been downloaded into the current directory using the link above):

```
import torch
import intel_extension_for_pytorch

model = torch.jit.load("elser_model_2_linux-x86_64.pt")
model.eval()

input_ids = [101, 1996, 2143, 2001, 2307, 999, 999, 102]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1]
token_type_ids = [0, 0, 0, 0, 0, 0, 0, 0]
position_ids = [0, 1, 2, 3, 4, 5, 6, 7]
results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]), torch.tensor([token_type_ids]), torch.tensor([position_ids]))
print(results)

input_ids = [101, 1996, 3185, 2001, 12476, 999, 999, 102]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1]
token_type_ids = [0, 0, 0, 0, 0, 0, 0, 0]
position_ids = [0, 1, 2, 3, 4, 5, 6, 7]
results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]), torch.tensor([token_type_ids]), torch.tensor([position_ids]))
print(results)
```

Output:

```
$ python3.10 ./test_elser.py 
tensor([[0., 0., 0.,  ..., 0., 0., 0.]])
Traceback (most recent call last):
  File "/home/ubuntu/./test_elser.py", line 19, in <module>
    results = model(torch.tensor([input_ids]), torch.tensor([attention_mask]), torch.tensor([token_type_ids]), torch.tensor([position_ids]))
  File "/usr/local/gcc103/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/gcc103/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: outputs_.size() == 1 INTERNAL ASSERT FAILED at "/root/anaconda3/envs/pytorch_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/jit/ir/ir.h":510, please report a bug to PyTorch. 
```

Output if the `import intel_extension_for_pytorch` line is deleted:

```
$ python3.10 ./test_elser.py 
tensor([[0., 0., 0.,  ..., 0., 0., 0.]])
tensor([[0., 0., 0.,  ..., 0., 0., 0.]])
```

So something in IPEX is causing the assertion failure.

### Versions

```
Collecting environment information...
PyTorch version: 2.1.1+cu121
PyTorch CXX11 ABI: No
IPEX version: 2.1.0+cpu
IPEX commit: 94f4320
Build type: Release

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (GCC) 10.3.0
Clang version: N/A
IGC version: N/A
CMake version: version 3.27.9
Libc version: glibc-2.27

Python version: 3.10.9 (main, Dec 11 2023, 11:18:08) [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-1103-aws-x86_64-with-glibc2.27
Is XPU available: False
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration: 

Intel OpenCL ICD version: N/A
Level Zero version: N/A

CPU:
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             3402.319
BogoMIPS:            5999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke

Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==2.1.0
[pip3] numpy==1.26.2
[pip3] torch==2.1.1
[conda] N/A
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IPEX v2.1.1+cpu crashes PyTorch for model where v1.13.100+cpu worked #484

Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IPEX v2.1.1+cpu crashes PyTorch for model where v1.13.100+cpu worked #484

Description

Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions