Skip to content

[libc++] std::memcpy does not work in HIP __device__ code #100802

@AngryLoki

Description

@AngryLoki

This code (seen in pytorch) compiles with stdlibc++, but fails with libc++:

// hipcc -c test.cpp -o test.o
#include <cstring>

__device__ void test1( void* dest, const void* src, std::size_t count ) {
    // this fails
    std::memcpy( dest, src, count );
    
    // but this works
    memcpy( dest, src, count );
}

Fails with error: reference to __host__ function 'memcpy' in __device__ function, see https://godbolt.org/z/h5nEnbb68

The issue lies between these lines:

#include <__clang_cuda_math_forward_declares.h>
#include <__clang_hip_cmath.h>
#include <__clang_cuda_complex_builtins.h>
#include <algorithm>

For math functions __clang_cuda_math_forward_declares.h exists and adds definitions for all math functions. But for stdlib functions there is no such file.

What happens in stdlibc++:

// Part1: /usr/include/string.h
extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
       size_t __n) noexcept (true) __attribute__ ((__nonnull__ (1, 2)));

// Part 2: "/opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_device_functions.h"
static inline __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size) {
    return __hip_hc_memcpy(dst, src, size);
}

// Part 3: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/cstring
namespace std {
  using ::memchr;
  using ::memcmp;
  using ::memcpy;
...

In libc++ part 2 and part 3 are swapped, because #include <algorithm> in libc++ includes <.../c++/v1/cstring>, which results in

// /opt/compiler-explorer/clang-rocm-6.1.2/bin/../include/c++/v1/cstring
namespace std { inline namespace __1 {
using ::size_t __attribute__((__using_if_exists__));
using ::memcpy __attribute__((__using_if_exists__));
using ::memmove __attribute__((__using_if_exists__));
...

// /opt/compiler-explorer/libs/rocm/6.1.2/include/hip/amd_detail/amd_device_functions.h
static inline __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size) {
    return __hip_hc_memcpy(dst, src, size);
}

Adding extern __attribute__((device)) void* memcpy(void* dst, const void* src, size_t size); before <algorithm> in __clang_hip_runtime_wrapper.h solves the issue.

In general it looks like a new file like __clang_cuda_stdlib_forward_declares.h with memcpy and memset __device__ declarations could solve the issue without breaking anything.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:headersHeaders provided by Clang, e.g. for intrinsics

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions