masked-attention 算法详解 - Zhang #202
Replies: 1 comment
-
| 评论了好多,结果没有了 | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
masked-attention 算法详解 - Zhang
从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。TransformerCasual Mask 机制的本质是为了构建下三角的注意力分数矩阵,从而实现因果模型只关注当前 token 与之前 token 的注意力关系,而不理会它与后续 token 的关系,即只
https://www.armcvai.cn/2024-11-10/masked-attention.html
Beta Was this translation helpful? Give feedback.
All reactions