Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard some stream syncs #4351

Open
wants to merge 2 commits into
base: development
Choose a base branch
from

Conversation

AlexanderSinn
Copy link
Member

Summary

Additional background

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@@ -202,7 +204,9 @@ MFCellConsLinInterp::interp (MultiFab const& crsemf, int ccomp, MultiFab& finemf
});
}

Gpu::streamSynchronize();
Copy link
Member

@WeiqunZhang WeiqunZhang Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep this one because there is a local MultiFab in this block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't pay attention to that. Should I use the async arena for the temporary memory or keep the stream sync?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to keep the stream sync so that the async arena does not use too much memory.

@@ -373,7 +377,9 @@ MFCellConsLinMinmaxLimitInterp::interp (MultiFab const& crsemf, int ccomp, Multi
}
});

Gpu::streamSynchronize();
Copy link
Member

@WeiqunZhang WeiqunZhang Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep this one because there is a local MultiFab in this block.

@@ -1031,7 +1033,9 @@ MLABecLaplacianT<MF>::Fsmooth (int amrlev, int mglev, MF& sol, const MF& rhs, in
});
}
}
Gpu::streamSynchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at the end of Fsmooth we should add the following in case this is in nosync region.

if (Ax.local_size() > 0 && Gpu::inNoSyncRegion()) { Gpu::streamSynchronize(); }

@@ -193,7 +193,9 @@ void MLCurlCurl::restriction (int amrlev, int cmglev, MF& crse, MF& fine) const
{
mlcurlcurl_restriction(idim,i,j,k,crsema[bno],finema[bno],dinfo);
});
Gpu::streamSynchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a local temp MulitFab. So it is not safe to remove it.

@@ -228,7 +230,9 @@ void MLCurlCurl::interpolation (int amrlev, int fmglev, MF& fine,
}
});
}
Gpu::streamSynchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either move the sync into the for-loop or make MultiFab cfine an array.

} else {
ParallelFor( nmf, [=] AMREX_GPU_DEVICE(int bno, int i, int j, int k)
{
mlcurlcurl_1D(i,j,k,ex[bno],ey[bno],ez[bno],
rhsx[bno],rhsy[bno],rhsz[bno],
b,adxinv,color,dinfo);
});
Gpu::streamSynchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one and the one above are needed because there is a temp MulitFab.

@@ -450,7 +458,9 @@ void MLCurlCurl::smooth4 (int amrlev, int mglev, MF& sol, MF const& rhs,
});
}
}
Gpu::streamSynchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not safe to remove this one.

@@ -193,7 +197,9 @@ MLNodeABecLaplacian::restriction (int amrlev, int cmglev, MultiFab& crse, MultiF
{
mlndlap_restriction(i,j,k,pcrse_ma[box_no],fine_ma[box_no],msk_ma[box_no]);
});
Gpu::streamSynchronize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not be safe.

@@ -225,7 +231,9 @@ MLNodeABecLaplacian::interpolation (int amrlev, int fmglev, MultiFab& fine, cons
mlndlap_interpadd_aa(i, j, k, fine_ma[box_no], crse_ma[box_no],
sig_ma[box_no], msk_ma[box_no]);
});
Gpu::streamSynchronize();
if (!Gpu::inNoSyncRegion()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!Gpu::inNoSyncRegion()) {
if (cfine.local_size() > 0 || !Gpu::inNoSyncRegion()) {

@@ -665,7 +669,9 @@ MLNodeLaplacian::interpolation (int amrlev, int fmglev, MultiFab& fine, const Mu
mlndlap_semi_interpadd_aa(i, j, k, fine_ma[box_no], crse_ma[box_no], sig_ma[box_no], msk_ma[box_no], idir);
});
}
Gpu::streamSynchronize();
if (!Gpu::inNoSyncRegion()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!Gpu::inNoSyncRegion()) {
if (cfine.local_size() > 0 || !Gpu::inNoSyncRegion()) {

@@ -701,7 +705,9 @@ MLNodeLaplacian::Fsmooth (int amrlev, int mglev, MultiFab& sol, const MultiFab&
}
}

Gpu::streamSynchronize();
if (!Gpu::inNoSyncRegion()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!Gpu::inNoSyncRegion()) {
if (Ax.local_size() > 0 || !Gpu::inNoSyncRegion()) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants