Skip to content

Conversation

govardhnn
Copy link
Contributor

No description provided.

COMPRESS,
WHOLE_REG_MOVE,
UNKNOWN,
NONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NONE should be before UNKNOWN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

@kathlenemagnus-mips
Copy link
Contributor

@govardhnn your design doc is missing the critical information that I'm looking for which is "what uops are generated for each type of permutation instruction?". It would be great to include a simple example of uop generation for each type of uop generator that you plan to add.

@govardhnn
Copy link
Contributor Author

govardhnn commented Jun 2, 2025 via email

@klingaard
Copy link
Collaborator

Hey @govardhnn where are you with this PR? Specifically, did you address @kathlenemagnus requests? Also, can you get regression to pass again?

@govardhnn
Copy link
Contributor Author

govardhnn commented Jul 19, 2025

Hi @klingaard and @kathlenemagnus-mips I have been on a personal break for a while - I should have informed the team earlier, apologies..
Can I get back to this by the end of the month? Or please feel free to take this over if the feature is on the critical path.

Thanks,

@klingaard
Copy link
Collaborator

Hey @govardhnn, yes, please feel free to take the time you need. We do appreciate the contributions that folks make to the model.

Copy link
Collaborator

@klingaard klingaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preemptive acceptance.

@govardhnn
Copy link
Contributor Author

Hi @klingaard, I will have to discontinue this submission since I have joined a startup in stealth recently. I will be unable to contribute to open source for sometime - and intend to come back in full force once I have the permissions to do so.

I will be happy if any other volunteers would like to build on this vector permutation proposal.

Thanks,
Govardhan

@klingaard
Copy link
Collaborator

Understood and we do appreciate the contributions you've made! We can take it from here. Best of luck in your endeavors.

@govardhnn
Copy link
Contributor Author

Thanks @klingaard!

&VectorUopGenerator::generateSlideUops_<InstArchInfo::UopGenType::SLIDE1DOWN>);

// Vector permute uop generator
// Vector general slide uop generators
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if could add some examples for the new uop generators.

}

InstPtr VectorUopGenerator::generatePermuteUops_()
template <InstArchInfo::UopGenType Type>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to have a template parameter if this method is only valid for InstArchInfo::UopGenType::SCALAR_MOVE (according to the static assert on line 512).


InstPtr VectorUopGenerator::generatePermuteUops_()
template <InstArchInfo::UopGenType Type>
InstPtr VectorUopGenerator::generateScalarMoveUops_()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what this new generator type provides that isn't already supported. Right now, they are not sequenced at all so no uops are generated and we just send the parent inst down the pipe. If there is a need to generate a uop, we could use the ELEMENTWISE generator and set num_uops_to_generate_ to 1.

return makeInst_(srcs, dests);
}

template <InstArchInfo::UopGenType Type> InstPtr VectorUopGenerator::generateSlideGeneralUops_()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this generator looks like it's doing the same thing as ELEMENTWISE. Not sure why it is needed.

@kathlenemagnus
Copy link
Collaborator

@govardhnn This work looks incomplete to me. I would expect this PR to contain modifications to Execute since the uops generated cannot be executed independently of each other.

For example, with your vrgather example in your doc:

UOP 1: vrgather.vv v20, v8, v4   # Process first register group
UOP 2: vrgather.vv v21, v9, v5   # Process second register group
UOP 3: vrgather.vv v22, v10, v6  # Process third register group
UOP 4: vrgather.vv v23, v11, v7  # Process fourth register group

It's possible that the indexes specified in v4 will need to gather elements from multiple vs2 registers (v8-v11) to write the result for v20. UOP 1 on its own will not read the right source registers to be able to write the correct value to v20. The result is that v20 will be marked ready earlier than is functionally possible. I see a similar issue with some of the vslide instructions.

Did you have more work planned for this project that you did not get to? If so, please document it so someone else can continue this work.

|General slides |`SLIDEUP`/`SLIDEDOWN` |VPERMUTE |4 UOPs
|Slide1 operations |`SLIDE1UP`/`SLIDE1DOWN` |VINT/VFLOAT |4 UOPs
|Register gather |`RGATHER` |VPERMUTE |4 UOPs
|Vector compress |`COMPRESS` |VPERMUTE |1 (always)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector compress always has 1 dest but could have up to 8 vector sources. Is that how you planned to support this instruction type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants