Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Normalize - 1D kernel Optimization #351

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

Dineshbabu-Ravichandran
  • Optimized Normalize 1D for f32 and u8 bit depth using SIMD instructions

@@ -88,6 +88,13 @@ void fill_roi_values(Rpp32u nDim, Rpp32u batchSize, Rpp32u *roiTensor, bool qaMo
{
switch(nDim)
{
case 1:
{
std::array<Rpp32u, 2> roi = {0, 10000};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does std::array needs any includes to be added?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It need #include to be included . But I am not sure where it is included .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your environment you might not get the issue when it is not included, in the CI the build will get failed if it is not included in some environments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am following the same approach how they previous used for case 2 and case 3. Is that is Okay ?

rppt_normalize_host(inputF32, srcDescriptorPtrND, outputF32, dstDescriptorPtrND, axisMask, meanTensor, stdDevTensor, computeMeanStddev, scale, shift, roiTensor, handle);

if(bitDepth == 0)
rppt_normalize_host(inputU8, srcDescriptorPtrND, outputU8, dstDescriptorPtrND, axisMask, meanTensor, stdDevTensor, computeMeanStddev, scale, shift, roiTensor, handle);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the u8 calls are addded here . you have added the same calls in both audio and image test suites. Please remove the changes for u8 if it is not needed

@@ -372,6 +385,48 @@ int main(int argc, char **argv)

break;
}
case 8:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the support in HIP test suite as well


Rpp32u totalElements = 1;
Rpp32u lastNormAxis = 0;
Rpp32u axis[tensorDims], newAxis[tensorDims], newDims[tensorDims];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are doing static allocations here. This might cause some issue . please use the scratch buffer available in the host handle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants