Skip to content

Conversation

@caiomcbr
Copy link
Contributor

@caiomcbr caiomcbr commented Nov 7, 2025

This PR introduces three new operations to enhance flexibility and performance at executor.

One operation can be invoked directly via the DSL API and two operations are created through fusion of existing operations, reducing overhead and improving efficiency.

  1. Port Channel Put Packet (Direct DSL API Call): Sends data from pkt format to the remote side in pkt format via the port channel. Both source and destination buffers must be scratch.

  2. Reduce Copy Packet (Fusion):
    Reduce Packet+Copy Packet=Reduce Copy Packet
    Triggered when the destination buffer of Reduce Packet matches the source buffer of Copy Packet.
    Purpose: Combine reduction and copy into a single step for better performance.

  3. Reduce Copy Send Packet (Fusion):
    Reduce Copy Packet+Put Packet=Reduce Copy Send Packet (when dst buffer of Reduce Copy Packet matches src buffer of Put Packet)
    Reduce Copy Packet+Read Put Packet=Reduce Copy Send Packet (when dst pkt buffer of Reduce Copy Packet matches src buffer of Read Put Packet)
    Purpose: Combine reduction, copy, and send operations into one optimized pipeline.

Fusion Diagram
Reduce Packet + Copy Packet → Reduce Copy Packet
Reduce Copy Packet + Put Packet → Reduce Copy Send Packet
Reduce Copy Packet + Read Put Packet → Reduce Copy Send Packet

Beyond this, this PR adjust the AllReduce 2 Node algorithm:

Message Size | Latency (µs)
1K | 15.34
2K | 15.88
4K | 15.71
8K | 16.01
16K | 15.88
32K | 16.21
64K | 16.90
128K | 18.24
256K | 20.39
512K | 25.26
1M | 32.74
2M | 53.64

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for two new packet-based reduce operations: REDUCE_COPY_PACKETS and REDUCE_COPY_SEND_PACKETS. These operations combine reduction, copying, and optionally sending data in packet format for distributed GPU communication.

Key Changes:

  • Introduces two new operation types for fused reduce-copy and reduce-copy-send operations with packet format
  • Implements the kernel handlers for these operations in the C++ execution layer
  • Adds Python DSL support with automatic operation fusion logic
  • Includes unit tests demonstrating the new operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/include/execution_common.hpp Adds enum values for REDUCE_COPY_PACKETS and REDUCE_COPY_SEND_PACKETS
src/include/execution_kernel.hpp Implements handleReduceCopySendPackets template function and integrates it into executeDeviceFunction
src/executor/execution_plan.cc Maps string opcodes "recpkt" and "recspkt" to the new operation types
python/mscclpp/language/internal/types.py Adds Instruction enum values for reduce_copy_packet and reduce_copy_send_packet
python/mscclpp/language/internal/operations.py Extends ReduceOperation to support operation fusion with copy and put operations
python/mscclpp/language/channel.py Adds put_packets method to MemoryChannel class
python/mscclpp/language/tests/unit_tests/reduce_copy_packet_test.py Unit test demonstrating REDUCE_COPY_PACKETS operation
python/mscclpp/language/tests/unit_tests/reduce_copy_send_packet_test.py Unit test demonstrating REDUCE_COPY_SEND_PACKETS operation
tools/npkit/npkit_trace_generator.py Updates event names list to include new operation types
include/mscclpp/npkit/npkit_event.hpp Updates NPKIT_EVENT_EXECUTOR_OP_BASE_EXIT offset to account for new operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@caiomcbr caiomcbr requested a review from Binyang2014 November 13, 2025 02:15
@caiomcbr caiomcbr merged commit 7eb3ff7 into main Nov 13, 2025
14 checks passed
@caiomcbr caiomcbr deleted the caiorocha/executor_kernel_func branch November 13, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants