-
Notifications
You must be signed in to change notification settings - Fork 14.8k
Description
This code (seen in pytorch) compiles with stdlibc++, but fails with libc++:
// hipcc -c test.cpp -o test.o
#include <cstring>
__device__ void test1( void* dest, const void* src, std::size_t count ) {
// this fails
std::memcpy( dest, src, count );
// but this works
memcpy( dest, src, count );
}
Fails with error: reference to __host__ function 'memcpy' in __device__ function
, see https://godbolt.org/z/h5nEnbb68
The issue lies between these lines:
llvm-project/clang/lib/Headers/__clang_hip_runtime_wrapper.h
Lines 142 to 145 in c80c09f
#include <__clang_cuda_math_forward_declares.h> | |
#include <__clang_hip_cmath.h> | |
#include <__clang_cuda_complex_builtins.h> | |
#include <algorithm> |
For math functions __clang_cuda_math_forward_declares.h exists and adds definitions for all math functions. But for stdlib functions there is no such file.
What happens in stdlibc++:
// Part1: /usr/include/string.h
extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
size_t __n) noexcept (true) __attribute__ ((__nonnull__ (1, 2)));
// Part 2: "/opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_device_functions.h"
static inline __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size) {
return __hip_hc_memcpy(dst, src, size);
}
// Part 3: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cstring
namespace std {
using ::memchr;
using ::memcmp;
using ::memcpy;
...
In libc++ part 2 and part 3 are swapped, because #include <algorithm>
in libc++ includes <.../c++/v1/cstring>
, which results in
// /opt/compiler-explorer/clang-rocm-6.1.2/bin/../include/c++/v1/cstring
namespace std { inline namespace __1 {
using ::size_t __attribute__((__using_if_exists__));
using ::memcpy __attribute__((__using_if_exists__));
using ::memmove __attribute__((__using_if_exists__));
...
// /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_device_functions.h
static inline __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size) {
return __hip_hc_memcpy(dst, src, size);
}
Adding extern __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size);
before <algorithm>
in __clang_hip_runtime_wrapper.h
solves the issue.
In general it looks like a new file like __clang_cuda_stdlib_forward_declares.h
with memcpy and memset __device__
declarations could solve the issue without breaking anything.