Prune inductor cache during Triton extraction#721
Open
leon062112 wants to merge 2 commits into
Open
Conversation
|
Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Feature Enhancement
Description
当前 Triton kernel 提取流程会为每个计算图生成 TorchInductor 编译 cache,包括
output_code.py、Triton 编译中间产物 PTX、.best_config等文件。完整 cache 体积较大,如果对完整 sample list 跑提取流程,占用空间太大。因此在每个 sample 编译完成后立即裁剪 cache,只保留后续提取和分析所需的最小文件集合。同时修复了一个 PTX 提取问题:在当前环境中,部分
.best_config的triton_cache_hash为null,原逻辑无法从多个 PTX candidate 中选出 autotune 最终采用的 PTX。改动内容
cache_pruner.py,用于裁剪单个 sample 的 TorchInductor cache。test_compiler_log.logoriginal_graph/model.pyoriginal_graph/graph_hash.txtoutput_code.py*.best_config*.ptx*.json*.pruned_meta.json*.source*.cubin*.ttir*.ttgir*.llir*.source前提取其中的 block size 信息,并写入轻量的*.pruned_meta.json。.best_config中的triton_cache_hashtriton_cache_hash为null时,从output_code.py的# kernel path:找到对应.best_configXBLOCK、num_warps、num_stages等 metadata 匹配正确的 PTX candidate验证
使用 2 个 typical subgraph sample测试: