You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I tried to reproduce your intuition shown in Figure 1. I followed your code and your paper to understand what Figure 1 visualizes and how to reproduce it. The caption mentions activation entropy but is not explicitly defined in the paper. In section 3.4, the paper mentions "visualization of activation" and "in-context activation". So I think it is activation score defined in equation 2 represents the activation entropy or in-context activation. That is the tensor activation_all_layers_score in the code below. Then for each layer and for each token, I select the value at the position corresponding to the given token. For instance, the token of Rome is idx, the specified value for each layer i and token j is activation_all_layers_score[i, idx, j]. However, the results I got are not exactly the same as yours but somehow nearly there.
One more important thing I noted is that the values from equation 2 should be less than 1 due to the softmax function. However, the values shown in Figure 1 are possibly higher than 1. So it is not completely clear what you did to demonstrate Figure 1. It's great if you can show the right things to reproduce your Figure 1?
Hi, I tried to reproduce your intuition shown in Figure 1. I followed your code and your paper to understand what Figure 1 visualizes and how to reproduce it. The caption mentions
activation entropy
but is not explicitly defined in the paper. In section 3.4, the paper mentions "visualization of activation" and "in-context activation". So I think it isactivation score
defined in equation 2 represents theactivation entropy
orin-context activation
. That is the tensoractivation_all_layers_score
in the code below. Then for each layer and for each token, I select the value at the position corresponding to the given token. For instance, the token of Rome isidx
, the specified value for each layer i and token j isactivation_all_layers_score[i, idx, j]
. However, the results I got are not exactly the same as yours but somehow nearly there.One more important thing I noted is that the values from equation 2 should be less than 1 due to the softmax function. However, the values shown in Figure 1 are possibly higher than 1. So it is not completely clear what you did to demonstrate Figure 1. It's great if you can show the right things to reproduce your Figure 1?
Results:
The text was updated successfully, but these errors were encountered: