Skip to content

hir::Allocate node#6000

Open
Priya2698 wants to merge 4 commits intomainfrom
pm/hir_allocate
Open

hir::Allocate node#6000
Priya2698 wants to merge 4 commits intomainfrom
pm/hir_allocate

Conversation

@Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented Feb 21, 2026

Creating a new hir::Allocate node that always allocates a new tensor. This is required to create new buffers per stream instead of reusing across streams which will require synchronization.

I am not modifying kir::Allocate handling. That caused errors with MultiDeviceExecutor tests.

@Priya2698 Priya2698 changed the title host IR allocate node hir::Allocate node Feb 21, 2026
@github-actions
Copy link

github-actions bot commented Feb 21, 2026

Review updated until commit e2db707

Description

  • Introduce new hir::Allocate node for per-stream tensor allocation

  • Replace kir::Allocate with hir::Allocate across host IR components

  • Add complete implementation with constructor, cloning, and string methods

  • Update JIT compilation and evaluation to handle new allocation node

  • Modify test files to use new allocation interface

Changes walkthrough

Relevant files
Enhancement
8 files
allocate_and_deallocate.cpp
Replace kir::Allocate with hir::Allocate in allocation insertion
+2/-1     
evaluator.cpp
Add handle method for hir::Allocate in host IR evaluator 
+22/-0   
ir.cpp
Implement complete hir::Allocate class with methods           
+35/-0   
jit.cpp
Update JIT compilation to handle hir::Allocate instead of
kir::Allocate
+7/-12   
lowering.cpp
Replace kir::Allocate with hir::Allocate in lowering pass
+3/-3     
dispatch.h
Add Allocate to dispatch system for hir namespace               
+1/-0     
evaluator.h
Add handle method declaration for hir::Allocate                   
+1/-0     
ir.h
Declare hir::Allocate class with complete interface           
+36/-0   
Tests
2 files
test_host_ir_evaluator.cpp
Update test to use hir::Allocate instead of kir::Allocate
+3/-3     
test_multidevice_host_ir.cpp
Update multi-device test to use hir::Allocate                       
+2/-2     

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
API Consistency

The new hir::Allocate constructor signature differs from kir::Allocate by omitting the dimensions parameter. This is correct for HIR level but ensure all call sites are updated consistently and the API is well-documented for future developers.

explicit Allocate(
    IrBuilderPasskey passkey,
    Val* in,
    MemoryType memory_type,
    bool zero_init = false);
Type Handling Logic

The type handling logic for Index dtype has been simplified. Verify this change is correct and doesn't break any edge cases where Index type tensors might need special handling beyond the current logic.

allocate->in()->dtype() == DataType::Index ? PrimDataType::Int
                                           : allocate->in()->dtype());
Memory Management

The hir::Allocate handler creates tensors using cuda::empty_strided_cuda. Ensure proper error handling for allocation failures and verify memory cleanup is handled correctly in all code paths, especially error scenarios.

  at::Tensor tensor = at::native::empty_strided_cuda(
      info.shape_info.logical_sizes,
      info.shape_info.logical_strides,
      info.type,
      c10::nullopt,
      device,
      c10::nullopt);

  if (allocate->zeroInit()) {
    tensor.zero_();
  }
  expr_evaluator_.bind(tv, tensor);
}

@Priya2698
Copy link
Collaborator Author

!test

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 21, 2026

Greptile Summary

Introduces a new hir::Allocate node to always allocate fresh tensor buffers instead of reusing across streams, avoiding synchronization overhead in multi-stream contexts.

Key changes:

  • New hir::Allocate class with simplified API (in() instead of buffer(), no shape vector parameter)
  • Migrated host IR allocation sites from kir::Allocate to hir::Allocate
  • Evaluator handler properly implements zeroInit() flag
  • JIT compilation handler missing zeroInit() support - will leave tensors uninitialized when flag is true

Confidence Score: 3/5

  • Safe to merge with one critical bug fix needed in JIT path
  • The implementation is well-structured and comprehensive, but the JIT handler is missing zeroInit() support which could cause correctness issues when the flag is used. The evaluator path correctly handles this flag. Tests were updated but don't exercise the JIT compilation path with zeroInit=true.
  • csrc/host_ir/jit.cpp requires fixing the zeroInit() handling

Important Files Changed

Filename Overview
csrc/host_ir/ir.h Defines new hir::Allocate class with in(), memoryType(), and zeroInit() methods
csrc/host_ir/ir.cpp Implements hir::Allocate constructor, toString(), and toInlineString() methods
csrc/host_ir/evaluator.cpp Implements evaluator handler that allocates tensors and supports zero initialization
csrc/host_ir/jit.cpp JIT handler migrated from kir::Allocate to hir::Allocate but missing zeroInit() support
csrc/host_ir/lowering.cpp Three instances updated from kir::Allocate to hir::Allocate for segment lowering

Last reviewed commit: e2db707

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Priya2698
Copy link
Collaborator Author

!test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

device_index_constant,
out_tensor});
val_to_value_[allocate->buffer()] = out_tensor;
val_to_value_[allocate->in()] = out_tensor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zeroInit() flag not handled in JIT path - tensor may contain uninitialized data when allocate->zeroInit() is true

Comment on lines +509 to +510
Val* in,
MemoryType memory_type,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Val* in,
MemoryType memory_type,
TensorView* in,

const std::vector<IterDomain*>& logical_domain = TensorDomain::noReductions(
allocate->buffer()->as<TensorView>()->getLogicalDomain());
const std::vector<IterDomain*>& logical_domain =
TensorDomain::noReductions(allocate->in()->getLogicalDomain());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider | TensorDomain::kNoReductions

@Priya2698
Copy link
Collaborator Author

Thanks for the early feedback @wujingyue.
This PR is blocked on #6007. I will make a workaround in this PR, if that takes too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants