Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions IMPROVED_FIX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Improved Fix for Issue #4603

## Problem Analysis

The original fix using `broadcastTensors()` is **insufficient** because:

1. `broadcastTensors()` only handles **rank mismatch** (e.g., `[3]` vs `[2, 3]`)
2. It does NOT handle **dimension size mismatch** (e.g., `[1]` vs `[2]`)
3. The DINOv3 error shows shapes `[2]` and `[1]` - same rank, different size

## Root Cause

TensorRT's `IIfConditionalOutputLayer` requires **exact shape match**, but:
- ONNX allows broadcasting where dimension size 1 can expand to any size N
- Example: `[1]` can broadcast to `[2]` by repeating the element

## Correct Solution

We need to implement full ONNX-style broadcasting using the pattern from the `Expand` operator:

### Algorithm
1. Align ranks by prepending dimensions of size 1
2. For each dimension, compute the broadcast size: `max(dim1, dim2)`
3. For each tensor, if `dim[i] == 1` and `broadcast_dim[i] > 1`, use stride=0 to expand
4. Use `ISliceLayer` with computed strides to perform the expansion

### Implementation Options

#### Option 1: Create a new `broadcastTensorsForConditional()` function
Add to `importerUtils.cpp`:

```cpp
void broadcastTensorsForConditional(ImporterContext* ctx, nvinfer1::ITensor*& t1, nvinfer1::ITensor*& t2)
{
// First, align ranks
broadcastTensors(ctx, t1, t2);

// Now both tensors have the same rank
auto shape1 = shapeOf(*t1);
auto shape2 = shapeOf(*t2);

// Compute broadcast shape: max(shape1, shape2) element-wise
ShapeTensor broadcastShape = broadcast(ctx, shape1, shape2);

// Expand t1 if needed
if (!shape1.allValuesKnown() || !broadcastShape.allValuesKnown() ||
!std::equal(shape1.begin(), shape1.end(), broadcastShape.begin()))
{
// Compute strides: stride[i] = (shape1[i] > 1) ? 1 : 0
ShapeTensor const zero = similar(ctx, shape1, 0);
ShapeTensor const one = similar(ctx, shape1, 1);
ShapeTensor const strides1 = min(ctx, one, max(ctx, sub(ctx, shape1, one), zero));

nvinfer1::ISliceLayer* slice1 = addSlice(ctx, *t1, zero, broadcastShape, strides1);
ctx->registerLayer(slice1, "ONNXTRT_BroadcastForConditional", nullptr);
t1 = N_CHECK(slice1->getOutput(0));
}

// Expand t2 if needed
if (!shape2.allValuesKnown() || !broadcastShape.allValuesKnown() ||
!std::equal(shape2.begin(), shape2.end(), broadcastShape.begin()))
{
// Compute strides: stride[i] = (shape2[i] > 1) ? 1 : 0
ShapeTensor const zero = similar(ctx, shape2, 0);
ShapeTensor const one = similar(ctx, shape2, 1);
ShapeTensor const strides2 = min(ctx, one, max(ctx, sub(ctx, shape2, one), zero));

nvinfer1::ISliceLayer* slice2 = addSlice(ctx, *t2, zero, broadcastShape, strides2);
ctx->registerLayer(slice2, "ONNXTRT_BroadcastForConditional", nullptr);
t2 = N_CHECK(slice2->getOutput(0));
}
}
```

#### Option 2: Simpler approach using element-wise trick
Since TensorRT's `IElementWiseLayer` handles broadcasting automatically, we can:
1. Add a dummy element-wise operation (e.g., multiply by 1)
2. Let TensorRT handle the broadcasting
3. Extract the broadcast result

However, this is a hack and may not work reliably.

#### Option 3: Use existing Expand logic inline
Directly apply the Expand operator's logic in the If operator:

```cpp
for (size_t i = 0; i < thenSubgraphTensors.size(); i++)
{
auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx);
auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx);

// Align ranks first
broadcastTensors(ctx, thenOut, elseOut);

// Now handle dimension size broadcasting
auto thenShape = shapeOf(*thenOut);
auto elseShape = shapeOf(*elseOut);
ShapeTensor broadcastShape = broadcast(ctx, thenShape, elseShape);

// Expand thenOut if needed
{
ShapeTensor const zero = similar(ctx, thenShape, 0);
ShapeTensor const one = similar(ctx, thenShape, 1);
ShapeTensor const strides = min(ctx, one, max(ctx, sub(ctx, thenShape, one), zero));
nvinfer1::ISliceLayer* slice = addSlice(ctx, *thenOut, zero, broadcastShape, strides);
ctx->registerLayer(slice, "ONNXTRT_BroadcastThen", nullptr);
thenOut = N_CHECK(slice->getOutput(0));
}

// Expand elseOut if needed
{
ShapeTensor const zero = similar(ctx, elseShape, 0);
ShapeTensor const one = similar(ctx, elseShape, 1);
ShapeTensor const strides = min(ctx, one, max(ctx, sub(ctx, elseShape, one), zero));
nvinfer1::ISliceLayer* slice = addSlice(ctx, *elseOut, zero, broadcastShape, strides);
ctx->registerLayer(slice, "ONNXTRT_BroadcastElse", nullptr);
elseOut = N_CHECK(slice->getOutput(0));
}

auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut));
ctx->registerLayer(outputLayer, std::string(conditional->getName()) + "_OutputLayer", &node);
graphOutputs.emplace_back(N_CHECK(outputLayer->getOutput(0)));
}
```

## Recommended Approach

**Option 3** is recommended because:
1. It's self-contained in the If operator
2. Uses proven logic from the Expand operator
3. Handles both rank and dimension size broadcasting
4. No need to modify importerUtils.cpp

## Testing

Test cases needed:
1. `[1]` vs `[2]` - dimension size broadcast
2. `[1, 3]` vs `[2, 3]` - partial dimension broadcast
3. `[3]` vs `[2, 3]` - rank broadcast
4. `[1]` vs `[2, 3]` - combined rank and dimension broadcast
5. `[2, 3]` vs `[2, 3]` - no broadcast needed (existing case)
121 changes: 121 additions & 0 deletions TECHNICAL_DETAILS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Technical Details: If Operator Shape Broadcasting Fix

## Background

### ONNX If Operator
The ONNX `If` operator implements conditional execution with two subgraphs:
- **then_branch**: Executed when condition is true
- **else_branch**: Executed when condition is false

Both branches must produce the same number of outputs, and corresponding outputs should have compatible types and shapes.

### TensorRT IIfConditional
TensorRT's `IIfConditional` layer requires:
- A boolean scalar condition tensor
- Input layers for external tensors used in subgraphs
- Output layers that merge results from both branches

**Critical Requirement:** `IIfConditionalOutputLayer::addOutput()` requires both input tensors to have **exactly the same shape** (same rank and same dimensions).

## The Problem

### Error Message Analysis
```
Invalid Node - /bb/rope_embeddings/If
/bb/rope_embeddings/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape.
Shapes are [2] and [1].
```

This error indicates:
1. Node name: `/bb/rope_embeddings/If`
2. Then-branch output shape: `[2]` (1D tensor with 2 elements)
3. Else-branch output shape: `[1]` (1D tensor with 1 element)
4. TensorRT rejects this because shapes don't match exactly

### Why This Happens in DINOv3

DINOv3 (Vision Transformer with rotary position embeddings) uses conditional logic for rope embeddings:
```python
# Simplified pseudocode
if condition:
return [freq1, freq2] # Shape: [2]
else:
return [default_freq] # Shape: [1]
```

Under ONNX broadcasting rules, `[1]` can be broadcast to `[2]`, making these shapes compatible. However, TensorRT doesn't automatically apply this broadcasting.

## The Solution

### Broadcasting Function
The codebase already includes `broadcastTensors()` in `importerUtils.cpp`:

```cpp
void broadcastTensors(ImporterContext* ctx, nvinfer1::ITensor*& t1, nvinfer1::ITensor*& t2)
{
int const t1Dims = t1->getDimensions().nbDims;
int const t2Dims = t2->getDimensions().nbDims;

if (t1Dims == t2Dims)
{
return; // Already same rank
}

if (t1Dims > t2Dims)
{
return broadcastTensor(ctx, t2, t1Dims); // Broadcast t2 to match t1
}
return broadcastTensor(ctx, t1, t2Dims); // Broadcast t1 to match t2
}
```

This function:
1. Checks if tensors already have the same number of dimensions
2. If not, calls `broadcastTensor()` to add leading dimensions of size 1
3. Uses `IShuffleLayer` to reshape the lower-rank tensor

### Implementation
The fix adds one line before creating the output layer:

```cpp
// Before fix:
auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx);
auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx);
auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut));

// After fix:
auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx);
auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx);
broadcastTensors(ctx, thenOut, elseOut); // ← NEW LINE
auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut));
```

## How It Works

### Example: DINOv3 rope_embeddings

**Before Broadcasting:**
- `thenOut`: Shape `[2]`, nbDims=1
- `elseOut`: Shape `[1]`, nbDims=1

Both have the same rank (1D), so `broadcastTensors()` returns immediately without changes.

**Wait, what?** If they have the same rank, why does the error occur?

The issue is that having the same **number of dimensions** (rank) doesn't mean having the same **shape**. The error message shows shapes `[2]` and `[1]`, which are both 1D tensors but with different sizes.

### Deeper Analysis

Looking more carefully at the error and the broadcasting function, I realize the current `broadcastTensors()` only handles **rank mismatch**, not **dimension size mismatch**.

For example:
- `[1]` vs `[2]`: Same rank (1), different size → NOT handled by current `broadcastTensors()`
- `[1]` vs `[2, 3]`: Different rank → Handled by `broadcastTensors()`

### The Real Fix Needed

The actual issue requires **element-wise broadcasting**, not just rank alignment. We need to handle cases where:
- Tensors have the same rank but different dimension sizes
- One tensor has size 1 in a dimension that needs to be broadcast to match the other

Let me check if there's a more appropriate function or if we need to enhance the fix.