Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for GPUs in Redecomp and the Tribol MFEM interface #56

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

ebchin
Copy link
Member

@ebchin ebchin commented Jan 30, 2024

This PR adds initial support for GPU data in the Redecomp library using MFEM's built in GPU support. This PR adds initial plumbing and tests on GPU to ensure GPU data transfers in Redecomp are functioning correctly. Follow-on PRs will add additional GPU support in Tribol and Redecomp.

Transfers from parallel data (grid functions and quadrature functions) on device to redecomp data are done as follows:

  1. Device parallel data is copied to host using mfem::Vector::HostRead().
  2. MPI communication moves host data from parallel rank to its redecomp rank.
  3. Host redecomp data is written to a host pointer obtained using mfem::Vector::HostWrite().

The final result is redecomp data on the host, which can be copied to the GPU as needed in MFEM. Transfers from redecomp data to parallel data are done in reverse order.

The support for GPU data is extended to the Tribol MFEM interface, which leverages the Redecomp library to prepare data for processing by Tribol. Since Tribol does not support GPUs, redecomp data is left on host where Tribol can compute contact contributions.

GPU support has been added to the following examples (and corresponding tests):

  1. domain_redecomp.cpp: use the Redecomp library to transfer parallel data on device to redecomp data on device and back
  2. mfem_common_plane.cpp: multirank common plane contact using MFEM (with GPU) and Tribol
  3. mfem_mortar_lm_patch.cpp: multirank contact patch test using MFEM (with GPU) and Tribol mortar

@ebchin ebchin self-assigned this Jan 30, 2024
Copy link
Collaborator

@srwopschall srwopschall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebchin - This looks good so far. I mostly have clarity type comments in this first pass. This mostly covers the examples so far.

@@ -96,6 +98,8 @@ int main( int argc, char** argv )
// very coarse meshes, though the decomposition can still fail with the simple
// checks implemented.
double max_out_of_balance = 0.05;
// device configuration string (see mfem::Device::Configure())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a short note on why you would see mfem::Device::Configure()? E.g. see mfem::Device::Configure() for device configuration options? Or is it something else?

@@ -110,13 +114,24 @@ int main( int argc, char** argv )
app.add_option("-t,--tol", max_out_of_balance,
"Max proportion of out-of-balance elements in RCB decomposition.")
->capture_default_str();
app.add_option("-d,--device", device_config,
"Device configuration string, see mfem::Device::Configure().")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note above for minor clarity comment.

SLIC_INFO_ROOT(axom::fmt::format("device: {0}\n", device_config));

// enable devices such as GPUs
mfem::Device device(device_config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this constructor do error checking on 'device_config' or should you do that here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention what device_config option(s) you have tested, run. Maybe one is just fine for the purposes of this PR.

@@ -186,6 +201,9 @@ void RedecompExample(mfem::ParMesh& pmesh, int order, double max_out_of_balance)
{
pmesh.GetNodes(par_x_ref_elem);
}
// move data to device (if device is available) and set host flag to stale
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider clarifying what 'data' is being moved, and what the intent is here. We spoke about how while a parmesh is always on host, various data underneath that parmesh can be sent to device. In terms of this example, what is the intent here? Consider explaining briefly.

@@ -269,7 +287,11 @@ void RedecompExample(mfem::ParMesh& pmesh, int order, double max_out_of_balance)
/////////////////////////////////////////////////////////////////////////////

timer.start();
// redecomp does transfers on host, so par_x_ref_elem data will be copied to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: This routine TransferToSerial() will take care of copying the par_x_ref_elem data back to host if it is on device, and then to the transfer to serial?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indirectly. The GetSubVector() method called here will get the correct pointer and make sure the DOF data is up-to-date.

// are present in this problem.
mfem::BlockVector B_blk { A_blk->RowOffsets() };
// Create a RHS vector storing forces and gaps. Note no external forces are present in this problem.
mfem::Vector B_blk(A_blk->Height());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block RHS vector for forces and gaps makes sense. Why change it? Don't necessarily need to 'block' designation/functionality? You stil call it B_blk though. If I just saw the new code I wouldn't question it, just wondering in the context of these diffs.

mfem::BlockVector B_blk { A_blk->RowOffsets() };
// Create a RHS vector storing forces and gaps. Note no external forces are present in this problem.
mfem::Vector B_blk(A_blk->Height());
B_blk.UseDevice(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment here about what UseDevice() will do when set to true. This sounds obvious, but this is specifically a new line of code that enables use of the gpu. Should we always set this to true, or should we set it based on device_config (if that is appropriate)? Seems like we would want it to be false if we want to run this example on cpu only?

// This API call returns the mortar nodal gap vector to an (uninitialized) vector. The function sizes and initializes
// the vector.
mfem::Vector gap;
gap.UseDevice(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about whether we always want 'true' to be passed in here.

P_submesh.MultTranspose(gap, gap_true);

// Create a solution vector storing displacement and pressures.
mfem::Vector X_blk(A_blk->Width());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, clarify the move from a block vector to a vector?

src/examples/mfem_mortar_lm_patch.cpp Show resolved Hide resolved
@ebchin ebchin marked this pull request as ready for review February 17, 2024 00:44
@ebchin ebchin requested a review from srwopschall February 17, 2024 00:44
@ebchin
Copy link
Member Author

ebchin commented Feb 17, 2024

@srwopschall, I think your comments are addressed in the latest commit. Let's discuss what kind of comments are appropriate in the MFEM examples, as the things that will be computed on device vs. host there are a bit in flux.

SLIC_INFO_ROOT(axom::fmt::format("device: {0}\n", device_config));

// enable devices such as GPUs
mfem::Device device(device_config);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention what device_config option(s) you have tested, run. Maybe one is just fine for the purposes of this PR.

@@ -186,6 +202,12 @@ void RedecompExample(mfem::ParMesh& pmesh, int order, double max_out_of_balance)
{
pmesh.GetNodes(par_x_ref_elem);
}
// Copy DOF data to device (if device is available), set host data as invalid,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider taking this out of example and ensuring it is in appropriate test.

@@ -269,7 +291,15 @@ void RedecompExample(mfem::ParMesh& pmesh, int order, double max_out_of_balance)
/////////////////////////////////////////////////////////////////////////////

timer.start();
// Redecomp does transfers on host, so par_x_ref_elem DOF data will be copied
// to host and the device DOF data of redecomp_x_ref_elem will be marked as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"[...] to host (if the data is on device)..."

// invalid, and set the device data as valid in the memory bit flags (see
// mfem::Memory::FlagMask for bit flag definitions). Now, DOF data must be
// copied from device before the host can access it.
redecomp_x_ref_elem.ReadWrite();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take this out. A user/host-code won't need to make this call.

// invalid, and set the device data as valid in the memory bit flags (see
// mfem::Memory::FlagMask for bit flag definitions). Now, DOF data must be
// copied from device before the host can manipulate it.
redecomp_x_ref_node.ReadWrite();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take this ReadWrite() out.

// mass update operator. Lumping is performed using row summation.
// This block of code builds a small-deformation elasticity explicit, lumped mass update operator. Lumping is
// performed using row summation. Elasticity and mass matrix contributions are computed on host (as implemented in
// MFEM), but mass inversion is done on device. These are only computed once, then stored on device.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Minor (if you decide to keep this comment in spirit of what's done on cpu vs gpu): but mass inversion is done on device if device_config enables GPU kernals.

  2. OR: take this comment out. If a user of this example enables gpus using device_config, we shouldn't explicitly state what is on CPU vs GPU because that may change.

Leaning toward (2).

SLIC_INFO_ROOT(axom::fmt::format("mu: {0}", mu));
SLIC_INFO_ROOT(axom::fmt::format("device: {0}\n", device_config));

// enable devices such as GPUs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment per other examples.

tribol::initialize(mesh.SpaceDimension(), MPI_COMM_WORLD);
// Copy the coords grid function DOF data to device (if available), set the host pointer flag as invalid, and set the
// device pointer to valid.
coords.ReadWrite();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this, but verify this is in the test? Remove comment here too.


// Copy the coords grid function DOF data to device (if available), set the host pointer flag as invalid, and set the
// device pointer to valid.
coords.ReadWrite();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and the comment. Verify if this line is needed in the test.

// are present in this problem.
mfem::BlockVector B_blk { A_blk->RowOffsets() };
// Create a RHS vector storing forces and gaps. Note no external forces are present in this problem.
mfem::Vector B_blk(A_blk->Height());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you stick with mfem::Vector, consider renaming B_blk to just B, and similiarly for A, unless you are actually storing one block of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants