You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand the meaning and motivation of axis rearrangement, but I want to know the procedure of this method.
According to the paper, loop among the spatial axes 𝑁 and 𝑀 first and then temporal axis 𝐾 is natural. The pseudo code I write is:
for_inrange(K):
for_inrange(N):
for_inrange(M):
It is weird for me. Naturally, the procedure should be?
for_inrange(N):
for_inrange(M):
for_inrange(K):
So it will loop among the temporal axis K first. Why paper claims
For GEMM𝐶[𝑁,𝑀]= 𝐴[𝑁,𝐾]×𝑊[𝑀,𝐾], it is natural to loop among the spatial axes 𝑁 and 𝑀, and then the temporal axis 𝐾.
And how to figure out
But if we swap the axis order from spatial first to temporal first, it will only maintain a small lookup table [1,𝐾].
Because the LUT was built among the axis K, as for the k in K, the weight will share the same LUT table entries. The size of lookup table for the specific n in N, m in M should be [ K//g, 2^g]? why [1, K] in paper?
Weight permutation for sequential memory access
I can't find the corresponding code for this design. Can someone explain the process in detail using words and code? And the final effect.
This may involve a lot of hardware knowledge. I would be very grateful for anyone can explain it in detail.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Axis reordering
I understand the meaning and motivation of axis rearrangement, but I want to know the procedure of this method.
According to the paper, loop among the spatial axes 𝑁 and 𝑀 first and then temporal axis 𝐾 is natural. The pseudo code I write is:
It is weird for me. Naturally, the procedure should be?
So it will loop among the temporal axis K first. Why paper claims
And how to figure out
Because the LUT was built among the axis K, as for the k in K, the weight will share the same LUT table entries. The size of lookup table for the specific
n in N
,m in M
should be [ K//g, 2^g]? why [1, K] in paper?Weight permutation for sequential memory access
I can't find the corresponding code for this design. Can someone explain the process in detail using words and code? And the final effect.
This may involve a lot of hardware knowledge. I would be very grateful for anyone can explain it in detail.
Thanks advance.
Beta Was this translation helpful? Give feedback.
All reactions