Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid if call in the __parallel_merge_submitter_large::run_parallel_merge() #1980

Closed
wants to merge 7 commits into from
27 changes: 14 additions & 13 deletions include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_merge.h
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,15 @@ struct __parallel_merge_submitter_large<_IdType, _CustomName,
});
}

template <typename _Rng1, typename _Rng2, typename _Compare>
inline static _split_point_t<_IdType>
__find_start_point_w(const _Rng1& __rng1, const _Rng2& __rng2, const _split_point_t<_IdType>& __sp_left,
Copy link
Contributor

@MikeDvorskiy MikeDvorskiy Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a purpose of __find_start_point_w? It seems, it does not any additional things, beside re-calling __find_start_point...

const _split_point_t<_IdType>& __sp_right, const _IdType __i_elem, _Compare __comp)
{
return __find_start_point(__rng1, __sp_left.first, __sp_right.first, __rng2, __sp_left.second,
__sp_right.second, __i_elem, __comp);
}

// Process parallel merge
template <typename _ExecutionPolicy, typename _Range1, typename _Range2, typename _Range3, typename _Compare,
typename _Storage>
Expand Down Expand Up @@ -306,19 +315,11 @@ struct __parallel_merge_submitter_large<_IdType, _CustomName,
_Storage::__get_usm_or_buffer_accessor_ptr(__base_diagonals_sp_global_acc);
auto __diagonal_idx = __global_idx / __nd_range_params.steps_between_two_base_diags;

_split_point_t<_IdType> __start;
if (__global_idx % __nd_range_params.steps_between_two_base_diags != 0)
{
const _split_point_t<_IdType> __sp_left = __base_diagonals_sp_global_ptr[__diagonal_idx];
const _split_point_t<_IdType> __sp_right = __base_diagonals_sp_global_ptr[__diagonal_idx + 1];

__start = __find_start_point(__rng1, __sp_left.first, __sp_right.first, __rng2,
__sp_left.second, __sp_right.second, __i_elem, __comp);
}
else
{
__start = __base_diagonals_sp_global_ptr[__diagonal_idx];
}
const _split_point_t<_IdType> __start =
__global_idx % __nd_range_params.steps_between_two_base_diags != 0
? __find_start_point_w(__rng1, __rng2, __base_diagonals_sp_global_ptr[__diagonal_idx],
__base_diagonals_sp_global_ptr[__diagonal_idx + 1], __i_elem, __comp)
: __base_diagonals_sp_global_ptr[__diagonal_idx];
Comment on lines +318 to +322
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance perspective, as far as I know a ternary operator (?) is equivalent to an if statement. Both end up as branches.

Is this PR intended to clean up the code to be improve readability or performance?


__serial_merge(__rng1, __rng2, __rng3, __start.first, __start.second, __i_elem,
__nd_range_params.chunk, __n1, __n2, __comp);
Expand Down
Loading