is paged attention an exact attention #13902
Replies: 3 comments 1 reply
-
Yes, it is exact attention. For prefilling stage. say if you have prompt of len 10. then you need to compute those self attention scores. a11, a21, a22, a31, a32, a33. .... A1010. So you need to compute those (1+10)*10/2 = 55 combinations right? for i=5, you need to compute a61, a62, a63, a64, a65 , a66 . Without page attetnion. you can simply do things in a sinlge block The equation above means. B means block size, which is 3 in this example. Then j is block index, which goes from 1 to 3 block j=1 stores token 1,2,3 so without page attention, you want to compute (a61, a62, a63, a64, a65 , a66) . then with page attention. you are computing |
Beta Was this translation helpful? Give feedback.
-
ok, I got it. The denominator summation index goes from the beginning of all the blocks before index i。 Previously somehow I read it as going through only the 'current' block. |
Beta Was this translation helpful? Give feedback.
-
exp(q^TK1)=exp (q^T *K_1+q^T K_2 + ...) ,而 exp(q^TK)*1=exp (q^T *K_1)+exp(q^T *K_2 )+ ..., it appears that the latter is the correct calculation method |
Beta Was this translation helpful? Give feedback.
-
For the blockwise computation of paged attention as in the above equation, it is not an exact attention. In an exact attention, all the attention scores sum up to 1. Here the attention scores in each block sum up to 1. So is it correct to say that paged attention is not an exact attention?
Beta Was this translation helpful? Give feedback.
All reactions