Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions src/simulation/m_muscl.fpp
Original file line number Diff line number Diff line change
Expand Up @@ -248,11 +248,12 @@ contains
integer :: j, k, l

real(wp) :: aCL, aCR, aC, aTHINC, qmin, qmax, A, B, C, sign, moncon
real(wp) :: rho_b, rho_e

#:for MUSCL_DIR, XYZ in [(1, 'x'), (2, 'y'), (3, 'z')]
if (muscl_dir == ${MUSCL_DIR}$) then

$:GPU_PARALLEL_LOOP(collapse=3,private='[j,k,l,aCL,aC,aCR,aTHINC,moncon,sign,qmin,qmax]')
$:GPU_PARALLEL_LOOP(collapse=3,private='[j,k,l,aCL,aC,aCR,aTHINC,moncon,sign,qmin,qmax,A,B,C,rho_b,rho_e]')
do l = is3_muscl%beg, is3_muscl%end
do k = is2_muscl%beg, is2_muscl%end
do j = is1_muscl%beg, is1_muscl%end
Expand All @@ -278,25 +279,25 @@ contains
B = exp(sign*ic_beta*(2._wp*C - 1._wp))
A = (B/cosh(ic_beta) - 1._wp)/tanh(ic_beta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A and B are constant in all iterations of the loop. This calculation should be moved outside, and they should be moved to copyin instead of private

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A and B are actually not loop-invariant β€” they depend on j,k,l through the dependency chain:

  1. aCL, aC, aCR are read from v_rs_ws_*_muscl(jΒ±1, k, l, advxb) β€” indexed by loop vars
  2. moncon, sign, qmin, qmax all derive from those
  3. C = (aC - qmin + sgm_eps)/(qmax + sgm_eps) β€” per-cell
  4. B = exp(sign*ic_beta*(2*C - 1)) β€” depends on sign and C
  5. A = (B/cosh(ic_beta) - 1)/tanh(ic_beta) β€” depends on B

What is loop-invariant are the transcendental functions of the constant ic_beta: cosh(ic_beta) and tanh(ic_beta). We could precompute those before the loop to avoid redundant evaluations on each GPU thread. Would that be a useful optimization to add?


! Save original density ratios before THINC overwrites them
rho_b = vL_rs_vf_${XYZ}$ (j, k, l, contxb)/vL_rs_vf_${XYZ}$ (j, k, l, advxb)
rho_e = vL_rs_vf_${XYZ}$ (j, k, l, contxe)/(1._wp - vL_rs_vf_${XYZ}$ (j, k, l, advxb))

! Left reconstruction
aTHINC = qmin + 5e-1_wp*qmax*(1._wp + sign*A)
if (aTHINC < ic_eps) aTHINC = ic_eps
if (aTHINC > 1 - ic_eps) aTHINC = 1 - ic_eps
vL_rs_vf_${XYZ}$ (j, k, l, contxb) = vL_rs_vf_${XYZ}$ (j, k, l, contxb)/ &
vL_rs_vf_${XYZ}$ (j, k, l, advxb)*aTHINC
vL_rs_vf_${XYZ}$ (j, k, l, contxe) = vL_rs_vf_${XYZ}$ (j, k, l, contxe)/ &
(1._wp - vL_rs_vf_${XYZ}$ (j, k, l, advxb))*(1._wp - aTHINC)
vL_rs_vf_${XYZ}$ (j, k, l, contxb) = rho_b*aTHINC
vL_rs_vf_${XYZ}$ (j, k, l, contxe) = rho_e*(1._wp - aTHINC)
vL_rs_vf_${XYZ}$ (j, k, l, advxb) = aTHINC
vL_rs_vf_${XYZ}$ (j, k, l, advxe) = 1 - aTHINC

! Right reconstruction
aTHINC = qmin + 5e-1_wp*qmax*(1._wp + sign*(tanh(ic_beta) + A)/(1._wp + A*tanh(ic_beta)))
if (aTHINC < ic_eps) aTHINC = ic_eps
if (aTHINC > 1 - ic_eps) aTHINC = 1 - ic_eps
vR_rs_vf_${XYZ}$ (j, k, l, contxb) = vL_rs_vf_${XYZ}$ (j, k, l, contxb)/ &
vL_rs_vf_${XYZ}$ (j, k, l, advxb)*aTHINC
vR_rs_vf_${XYZ}$ (j, k, l, contxe) = vL_rs_vf_${XYZ}$ (j, k, l, contxe)/ &
(1._wp - vL_rs_vf_${XYZ}$ (j, k, l, advxb))*(1._wp - aTHINC)
vR_rs_vf_${XYZ}$ (j, k, l, contxb) = rho_b*aTHINC
vR_rs_vf_${XYZ}$ (j, k, l, contxe) = rho_e*(1._wp - aTHINC)
vR_rs_vf_${XYZ}$ (j, k, l, advxb) = aTHINC
vR_rs_vf_${XYZ}$ (j, k, l, advxe) = 1 - aTHINC

Expand All @@ -320,7 +321,7 @@ contains
integer :: j, k, l, q !< Generic loop iterators

! Determining the number of cell-average variables which will be
! muscl-reconstructed and mapping their indical bounds in the x-,
! muscl-reconstructed and mapping their index bounds in the x-,
! y- and z-directions to those in the s1-, s2- and s3-directions
! as to reshape the inputted data in the coordinate direction of
! the muscl reconstruction
Expand Down
Loading