[TLE] Feature/extrac tile strides new#649
Conversation
…nd add test examples optimized using these two primitives
f53f2ba to
eb9caf6
Compare
sunnycase
left a comment
There was a problem hiding this comment.
Requesting changes before merge:
-
Please replace the Chinese comments with English. I found Chinese comments in:
python/triton/experimental/tle/language/gpu/semantic.py:118python/test/tle/unit/test_insert_tile_static_index.py:93third_party/tle/dialect/lib/Transforms/ExtractTileToLLVM.cpp:40
Comments in source and tests should be readable by all maintainers.
-
Please expand the PR summary. The current summary only states that
strideswas added totle.extract_tileandtle.insert_tile. It should explicitly document the changed APIs, including the Python API surface and any corresponding TLE/MLIR op attribute changes. It should also state which operators show performance improvement and include the measured before/after numbers or speedups, for example GLU, 2D Depthwise Conv, Causal Conv1D, and RoPE if those new tutorials are the performance evidence. -
Please add the newly introduced tutorials to CI. This PR adds:
python/tutorials/tle/05-glu.pypython/tutorials/tle/06-2D_Depthwise_Conv.pypython/tutorials/tle/07-causal-conv1d.pypython/tutorials/tle/08-rope.py
However, the Hopper TLE tutorial workflow still only runs the existing tutorials through
04-cluster-gemm.pyplus the DeepSeek examples (.github/workflows/hopper-build-and-test.yml:122and.github/workflows/hopper-build-and-test.yml:180). Please add lightweight correctness invocations for the new tutorials, such as the existing--only_unit_testpaths for 05/06 and non-benchmark correctness runs for 07/08, or add equivalent CI test targets so these examples do not land untested.
|
@sunnycase
|
Summary:
API Changes:
Performance:
New tutorials optimized with tle.extract_tile / tle.insert_tile:
CI:
Added lightweight correctness tests for tutorials 05-08 in hopper-build-and-test.yml.