Skip to content

[TLE] Feature/extrac tile strides new#649

Open
lzllx123 wants to merge 19 commits into
triton_v3.6.xfrom
feature/extrac_tile_strides_new
Open

[TLE] Feature/extrac tile strides new#649
lzllx123 wants to merge 19 commits into
triton_v3.6.xfrom
feature/extrac_tile_strides_new

Conversation

@lzllx123
Copy link
Copy Markdown
Collaborator

@lzllx123 lzllx123 commented Jun 4, 2026

Summary:

  1. [TLE]Add optional strides parameter to tle.extract_tile and tle.insert_tile, enabling strided tile extraction/insertion.
  2. Adapted for NVIDIA and GCU backends.

API Changes:

  1. Python: extract_tile(x, index, tile_shape, strides=None) / insert_tile(x, tile, index, strides=None)
  2. MLIR: Tle_ExtractTileOp and Tle_InsertTileOp added OptionalAttr:$strides
  3. Backward compatible: defaults to tile_shape when omitted

Performance:

New tutorials optimized with tle.extract_tile / tle.insert_tile:

  1. 05-glu.py (GLU)
  2. 06-2D_Depthwise_Conv.py (2D Depthwise Conv)
  3. 07-causal-conv1d.py (Causal Conv1D)
  4. 08-rope.py (RoPE)

CI:

Added lightweight correctness tests for tutorials 05-08 in hopper-build-and-test.yml.

@sunnycase sunnycase changed the title Feature/extrac tile strides new [TLE] Feature/extrac tile strides new Jun 5, 2026
@github-actions github-actions Bot added the nvidia label Jun 8, 2026
@sunnycase sunnycase force-pushed the feature/extrac_tile_strides_new branch from f53f2ba to eb9caf6 Compare June 8, 2026 08:04
@github-actions github-actions Bot removed the nvidia label Jun 8, 2026
Copy link
Copy Markdown
Collaborator

@sunnycase sunnycase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes before merge:

  1. Please replace the Chinese comments with English. I found Chinese comments in:

    • python/triton/experimental/tle/language/gpu/semantic.py:118
    • python/test/tle/unit/test_insert_tile_static_index.py:93
    • third_party/tle/dialect/lib/Transforms/ExtractTileToLLVM.cpp:40

    Comments in source and tests should be readable by all maintainers.

  2. Please expand the PR summary. The current summary only states that strides was added to tle.extract_tile and tle.insert_tile. It should explicitly document the changed APIs, including the Python API surface and any corresponding TLE/MLIR op attribute changes. It should also state which operators show performance improvement and include the measured before/after numbers or speedups, for example GLU, 2D Depthwise Conv, Causal Conv1D, and RoPE if those new tutorials are the performance evidence.

  3. Please add the newly introduced tutorials to CI. This PR adds:

    • python/tutorials/tle/05-glu.py
    • python/tutorials/tle/06-2D_Depthwise_Conv.py
    • python/tutorials/tle/07-causal-conv1d.py
    • python/tutorials/tle/08-rope.py

    However, the Hopper TLE tutorial workflow still only runs the existing tutorials through 04-cluster-gemm.py plus the DeepSeek examples (.github/workflows/hopper-build-and-test.yml:122 and .github/workflows/hopper-build-and-test.yml:180). Please add lightweight correctness invocations for the new tutorials, such as the existing --only_unit_test paths for 05/06 and non-benchmark correctness runs for 07/08, or add equivalent CI test targets so these examples do not land untested.

@lzllx123
Copy link
Copy Markdown
Collaborator Author

lzllx123 commented Jun 8, 2026

@sunnycase
All three requested items have been fully resolved:

  1. All Chinese comments in the three listed source and test files are replaced with standard English.
  2. The PR description has been expanded completely, including full documentation of modified Python APIs, corresponding TLE/MLIR op attribute adjustments, and complete measured speedup benchmark data for GLU, 2D Depthwise Conv, Causal Conv1D and RoPE.
  3. Added lightweight correctness test targets for the four new tutorial files in the Hopper CI workflow, following the existing --only_unit_test pattern, to ensure all new tutorials will be validated during CI runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants