MatMul op results mismatch (NPU/Numpy) #145

kballeda · 2024-12-08T13:21:13Z

I attempted to measure the performance between NPU and Numpy-based dot product computations using float16 type but I found that results mismatch occurs between NPU/CPU numpy. I am using NPU on ARL.

To Reproduce
Steps to reproduce the behavior:

Copy the code below and to run it python <filename.py>

from intel_npu_acceleration_library.backend import MatMul
import numpy as np
import time

inC = 8
outC = 8
batch = 1

X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

start_time = time.perf_counter()
result = mm.run(X1,X2)
end_time = time.perf_counter()
print(f"Intel NPU Acceleration Library Time: {end_time - start_time} * 1000:.6f millisecs")

start_time = time.perf_counter()
np_res = np.dot(X1, X2)
end_time = time.perf_counter()
print(f"Numpy Library Time: {end_time - start_time} * 1000:.6f millisecs")

print("NPU Result: ", result)
print("Numpy Result:", np_res)

Expected behavior
Output mismatch between numpy/NPU occurs

>python matmul.py
Intel NPU Acceleration Library Time: 0.177700 ms
Numpy Library Time: 0.015700 ms
NPU Result:  [[ 1.178  -0.572   1.957  -0.4443 -0.549   0.7744 -0.2756 -0.997 ]]
Numpy Result: [[-0.1885    1.905    -1.779    -0.8945    0.866    -0.9365    1.248
  -0.006233]]

>python matmul.py
Intel NPU Acceleration Library Time: 0.189100 ms
Numpy Library Time: 0.016700 ms
NPU Result:  [[-0.3157  2.506   0.3499  0.631  -0.1031  0.5913 -1.599   1.001 ]]
Numpy Result: [[-1.137    0.09534  0.12366 -1.748    0.981    0.2004  -0.2607  -2.086  ]]

>python matmul.py
Intel NPU Acceleration Library Time: 0.177000 ms
Numpy Library Time: 0.013300 ms
NPU Result:  [[ 0.7153 -1.251   0.2106 -0.409  -0.4336  0.2329  1.653  -1.58  ]]
Numpy Result: [[-1.764   1.005  -0.837  -1.518   0.8794  0.427   0.1887 -1.153 ]]

Desktop (please complete the following information):

OS: Win11Enterprise

The text was updated successfully, but these errors were encountered:

alessandropalla · 2024-12-09T07:22:24Z

Hi, mm.run(X1,X2) is equivalent to np.dot(X1, X2.T), if you use that the math checks out.

Also, external AI accelerators (like GPUs and NPUs) are more effective when offloading large operations, you won't see speedups in multiplying a 8x8 matrix with a vector (as explained very nicely here )

Here an example with a medium size matrix matrix operation that have a significative speedup:

from intel_npu_acceleration_library.backend import MatMul
import numpy as np
import time

inC = 1024
outC = 1024
batch = 256

X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

start_time = time.perf_counter()
result = mm.run(X1,X2)
end_time = time.perf_counter()
print(f"Intel NPU Acceleration Library Time: {(end_time - start_time) * 1000:.6f} ms")

start_time = time.perf_counter()
np_res = np.dot(X1, X2.T)
end_time = time.perf_counter()
print(f"Numpy Library Time: {(end_time - start_time) * 1000:.6f} ms")

print("NPU Result: ", result)
print("Numpy Result:", np_res)

attached some code you can use for reference, it returns the following on an ARL machine

Intel NPU Acceleration Library Time: 0.831700 ms
Numpy Library Time: 2027.385200 ms
NPU Result:  [[ 18.16   -12.62     6.703  ...   8.24     4.71    -5.74  ]
 [  5.82   -15.77   -11.47   ...  -6.914   11.86    -0.8833]
 [  3.604   -7.562    6.15   ...  14.19     8.88    13.63  ]
 ...
 [ -7.438   -8.72     0.1948 ...   5.684   -1.962   -0.7773]
 [  4.727   19.52   -13.34   ...   4.973   -4.89    18.12  ]
 [ -8.625    4.54     7.22   ... -11.734   14.914  -19.64  ]]
Numpy Result: [[ 18.16   -12.62     6.703  ...   8.24     4.71    -5.74  ]
 [  5.82   -15.77   -11.47   ...  -6.914   11.86    -0.8833]
 [  3.604   -7.562    6.15   ...  14.19     8.88    13.63  ]
 ...
 [ -7.438   -8.72     0.1948 ...   5.684   -1.962   -0.7773]
 [  4.727   19.52   -13.34   ...   4.973   -4.89    18.12  ]
 [ -8.625    4.54     7.22   ... -11.734   14.914  -19.64  ]]

Also, consider that first time you compile results are skewed because of first inference latency

kballeda · 2024-12-09T12:26:06Z

Thank you will check this and confirm at my end.

kballeda changed the title ~~MatrixMul sample doesnt work~~ Results mismatch with MatMul op Dec 8, 2024

kballeda changed the title ~~Results mismatch with MatMul op~~ MatMul op results mismatch (NPU/Numpy) Dec 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MatMul op results mismatch (NPU/Numpy) #145

MatMul op results mismatch (NPU/Numpy) #145

kballeda commented Dec 8, 2024

alessandropalla commented Dec 9, 2024

kballeda commented Dec 9, 2024

MatMul op results mismatch (NPU/Numpy) #145

MatMul op results mismatch (NPU/Numpy) #145

Comments

kballeda commented Dec 8, 2024

alessandropalla commented Dec 9, 2024

kballeda commented Dec 9, 2024