NVIDIA · blackboxaicode · Nov 7, 2025
@@ -0,0 +1,141 @@
+# Improved Fix for Issue #4603
+
+## Problem Analysis
+
+The original fix using `broadcastTensors()` is **insufficient** because:
+
+1. `broadcastTensors()` only handles **rank mismatch** (e.g., `[3]` vs `[2, 3]`)
+2. It does NOT handle **dimension size mismatch** (e.g., `[1]` vs `[2]`)
+3. The DINOv3 error shows shapes `[2]` and `[1]` - same rank, different size
+
+## Root Cause
+
+TensorRT's `IIfConditionalOutputLayer` requires **exact shape match**, but:
+- ONNX allows broadcasting where dimension size 1 can expand to any size N
+- Example: `[1]` can broadcast to `[2]` by repeating the element
+
+## Correct Solution
+
+We need to implement full ONNX-style broadcasting using the pattern from the `Expand` operator:
+
+### Algorithm
+1. Align ranks by prepending dimensions of size 1
+2. For each dimension, compute the broadcast size: `max(dim1, dim2)`
+3. For each tensor, if `dim[i] == 1` and `broadcast_dim[i] > 1`, use stride=0 to expand
+4. Use `ISliceLayer` with computed strides to perform the expansion
+
+### Implementation Options
+
+#### Option 1: Create a new `broadcastTensorsForConditional()` function
+Add to `importerUtils.cpp`:
+
+```cpp
+void broadcastTensorsForConditional(ImporterContext* ctx, nvinfer1::ITensor*& t1, nvinfer1::ITensor*& t2)
+{
+    // First, align ranks
+    broadcastTensors(ctx, t1, t2);
+
+    // Now both tensors have the same rank
+    auto shape1 = shapeOf(*t1);
+    auto shape2 = shapeOf(*t2);
+
+    // Compute broadcast shape: max(shape1, shape2) element-wise
+    ShapeTensor broadcastShape = broadcast(ctx, shape1, shape2);
+
+    // Expand t1 if needed
+    if (!shape1.allValuesKnown() || !broadcastShape.allValuesKnown() || 
+        !std::equal(shape1.begin(), shape1.end(), broadcastShape.begin()))
+    {
+        // Compute strides: stride[i] = (shape1[i] > 1) ? 1 : 0
+        ShapeTensor const zero = similar(ctx, shape1, 0);
+        ShapeTensor const one = similar(ctx, shape1, 1);
+        ShapeTensor const strides1 = min(ctx, one, max(ctx, sub(ctx, shape1, one), zero));
+
+        nvinfer1::ISliceLayer* slice1 = addSlice(ctx, *t1, zero, broadcastShape, strides1);
+        ctx->registerLayer(slice1, "ONNXTRT_BroadcastForConditional", nullptr);
+        t1 = N_CHECK(slice1->getOutput(0));
+    }
+
+    // Expand t2 if needed
+    if (!shape2.allValuesKnown() || !broadcastShape.allValuesKnown() || 
+        !std::equal(shape2.begin(), shape2.end(), broadcastShape.begin()))
+    {
+        // Compute strides: stride[i] = (shape2[i] > 1) ? 1 : 0
+        ShapeTensor const zero = similar(ctx, shape2, 0);
+        ShapeTensor const one = similar(ctx, shape2, 1);
+        ShapeTensor const strides2 = min(ctx, one, max(ctx, sub(ctx, shape2, one), zero));
+
+        nvinfer1::ISliceLayer* slice2 = addSlice(ctx, *t2, zero, broadcastShape, strides2);
+        ctx->registerLayer(slice2, "ONNXTRT_BroadcastForConditional", nullptr);
+        t2 = N_CHECK(slice2->getOutput(0));
+    }
+}
+```
+
+#### Option 2: Simpler approach using element-wise trick
+Since TensorRT's `IElementWiseLayer` handles broadcasting automatically, we can:
+1. Add a dummy element-wise operation (e.g., multiply by 1)
+2. Let TensorRT handle the broadcasting
+3. Extract the broadcast result
+
+However, this is a hack and may not work reliably.
+
+#### Option 3: Use existing Expand logic inline
+Directly apply the Expand operator's logic in the If operator:
+
+```cpp
+for (size_t i = 0; i < thenSubgraphTensors.size(); i++)
+{
+    auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx);
+    auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx);
+
+    // Align ranks first
+    broadcastTensors(ctx, thenOut, elseOut);
+
+    // Now handle dimension size broadcasting
+    auto thenShape = shapeOf(*thenOut);
+    auto elseShape = shapeOf(*elseOut);
+    ShapeTensor broadcastShape = broadcast(ctx, thenShape, elseShape);
+
+    // Expand thenOut if needed
+    {
+        ShapeTensor const zero = similar(ctx, thenShape, 0);
+        ShapeTensor const one = similar(ctx, thenShape, 1);
+        ShapeTensor const strides = min(ctx, one, max(ctx, sub(ctx, thenShape, one), zero));
+        nvinfer1::ISliceLayer* slice = addSlice(ctx, *thenOut, zero, broadcastShape, strides);
+        ctx->registerLayer(slice, "ONNXTRT_BroadcastThen", nullptr);
+        thenOut = N_CHECK(slice->getOutput(0));
+    }
+
+    // Expand elseOut if needed
+    {
+        ShapeTensor const zero = similar(ctx, elseShape, 0);
+        ShapeTensor const one = similar(ctx, elseShape, 1);
+        ShapeTensor const strides = min(ctx, one, max(ctx, sub(ctx, elseShape, one), zero));
+        nvinfer1::ISliceLayer* slice = addSlice(ctx, *elseOut, zero, broadcastShape, strides);
+        ctx->registerLayer(slice, "ONNXTRT_BroadcastElse", nullptr);
+        elseOut = N_CHECK(slice->getOutput(0));
+    }
+
+    auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut));
+    ctx->registerLayer(outputLayer, std::string(conditional->getName()) + "_OutputLayer", &node);
+    graphOutputs.emplace_back(N_CHECK(outputLayer->getOutput(0)));
+}
+```
+
+## Recommended Approach
+
+**Option 3** is recommended because:
+1. It's self-contained in the If operator
+2. Uses proven logic from the Expand operator
+3. Handles both rank and dimension size broadcasting
+4. No need to modify importerUtils.cpp
+
+## Testing
+
+Test cases needed:
+1. `[1]` vs `[2]` - dimension size broadcast
+2. `[1, 3]` vs `[2, 3]` - partial dimension broadcast
+3. `[3]` vs `[2, 3]` - rank broadcast
+4. `[1]` vs `[2, 3]` - combined rank and dimension broadcast
+5. `[2, 3]` vs `[2, 3]` - no broadcast needed (existing case)
@@ -0,0 +1,121 @@
+# Technical Details: If Operator Shape Broadcasting Fix
+
+## Background
+
+### ONNX If Operator
+The ONNX `If` operator implements conditional execution with two subgraphs:
+- **then_branch**: Executed when condition is true
+- **else_branch**: Executed when condition is false
+
+Both branches must produce the same number of outputs, and corresponding outputs should have compatible types and shapes.
+
+### TensorRT IIfConditional
+TensorRT's `IIfConditional` layer requires:
+- A boolean scalar condition tensor
+- Input layers for external tensors used in subgraphs
+- Output layers that merge results from both branches
+
+**Critical Requirement:** `IIfConditionalOutputLayer::addOutput()` requires both input tensors to have **exactly the same shape** (same rank and same dimensions).
+
+## The Problem
+
+### Error Message Analysis
+```
+Invalid Node - /bb/rope_embeddings/If
+/bb/rope_embeddings/If_OutputLayer: IIfConditionalOutputLayer inputs must have the same shape. 
+Shapes are [2] and [1].
+```
+
+This error indicates:
+1. Node name: `/bb/rope_embeddings/If`
+2. Then-branch output shape: `[2]` (1D tensor with 2 elements)
+3. Else-branch output shape: `[1]` (1D tensor with 1 element)
+4. TensorRT rejects this because shapes don't match exactly
+
+### Why This Happens in DINOv3
+
+DINOv3 (Vision Transformer with rotary position embeddings) uses conditional logic for rope embeddings:
+```python
+# Simplified pseudocode
+if condition:
+    return [freq1, freq2]  # Shape: [2]
+else:
+    return [default_freq]  # Shape: [1]
+```
+
+Under ONNX broadcasting rules, `[1]` can be broadcast to `[2]`, making these shapes compatible. However, TensorRT doesn't automatically apply this broadcasting.
+
+## The Solution
+
+### Broadcasting Function
+The codebase already includes `broadcastTensors()` in `importerUtils.cpp`:
+
+```cpp
+void broadcastTensors(ImporterContext* ctx, nvinfer1::ITensor*& t1, nvinfer1::ITensor*& t2)
+{
+    int const t1Dims = t1->getDimensions().nbDims;
+    int const t2Dims = t2->getDimensions().nbDims;
+
+    if (t1Dims == t2Dims)
+    {
+        return;  // Already same rank
+    }
+
+    if (t1Dims > t2Dims)
+    {
+        return broadcastTensor(ctx, t2, t1Dims);  // Broadcast t2 to match t1
+    }
+    return broadcastTensor(ctx, t1, t2Dims);  // Broadcast t1 to match t2
+}
+```
+
+This function:
+1. Checks if tensors already have the same number of dimensions
+2. If not, calls `broadcastTensor()` to add leading dimensions of size 1
+3. Uses `IShuffleLayer` to reshape the lower-rank tensor
+
+### Implementation
+The fix adds one line before creating the output layer:
+
+```cpp
+// Before fix:
+auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx);
+auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx);
+auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut));
+
+// After fix:
+auto* thenOut = &convertToTensor(thenSubgraphTensors[i], ctx);
+auto* elseOut = &convertToTensor(elseSubgraphTensors[i], ctx);
+broadcastTensors(ctx, thenOut, elseOut);  // ← NEW LINE
+auto* outputLayer = N_CHECK(conditional->addOutput(*thenOut, *elseOut));
+```
+
+## How It Works
+
+### Example: DINOv3 rope_embeddings
+
+**Before Broadcasting:**
+- `thenOut`: Shape `[2]`, nbDims=1
+- `elseOut`: Shape `[1]`, nbDims=1
+
+Both have the same rank (1D), so `broadcastTensors()` returns immediately without changes.
+
+**Wait, what?** If they have the same rank, why does the error occur?
+
+The issue is that having the same **number of dimensions** (rank) doesn't mean having the same **shape**. The error message shows shapes `[2]` and `[1]`, which are both 1D tensors but with different sizes.
+
+### Deeper Analysis
+
+Looking more carefully at the error and the broadcasting function, I realize the current `broadcastTensors()` only handles **rank mismatch**, not **dimension size mismatch**.
+
+For example:
+- `[1]` vs `[2]`: Same rank (1), different size → NOT handled by current `broadcastTensors()`
+- `[1]` vs `[2, 3]`: Different rank → Handled by `broadcastTensors()`
+
+### The Real Fix Needed
+
+The actual issue requires **element-wise broadcasting**, not just rank alignment. We need to handle cases where:
+- Tensors have the same rank but different dimension sizes
+- One tensor has size 1 in a dimension that needs to be broadcast to match the other
+
+Let me check if there's a more appropriate function or if we need to enhance the fix.