Skip to content

NVVM IR Invalid data layout when building with Cuda 12.0 #175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorge-ortega opened this issue Mar 22, 2025 · 3 comments
Open

NVVM IR Invalid data layout when building with Cuda 12.0 #175

jorge-ortega opened this issue Mar 22, 2025 · 3 comments

Comments

@jorge-ortega
Copy link
Collaborator

jorge-ortega commented Mar 22, 2025

When using CUDA 12.0, verifying the NVVR IR generated fails with the following error:


  Error: Invalid data layout :
  Error: e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64
  Error: 32-bit: e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64
  Error: 64-bit: e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64

The format is deprecated but still works with 12.1 and higher. This might be a bug in the verifier in 12.0 that we can ignore. This would be best fixed by using the supported data layout when using CUDA 12 and up.

@tlaakkonen
Copy link

tlaakkonen commented Apr 18, 2025

I'm getting the same error with Cuda 12.8. But I saw on another thread you were able to work with 12.8 - is there a workaround you're using?

Output of cargo build:

...
 thread 'rustc' panicked at /home/epyc/.cargo/git/checkouts/rust-cuda-f2e68423bdc84998/290d711/crates/rustc_codegen_nvvm/src/nvvm.rs:120:9:
  Malformed NVVM IR program rejected by libnvvm, dumping verifier log:

  Error: Invalid data layout :
  Error: e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64
  Error: 32-bit: e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64
  Error: 64-bit: e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64
...

Output of nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_0

Output of $LLVM_CONFIG --version: 7.0.1

Edit: Fixed, there was an old version of Cuda 12.0 hanging around that needed to be purged properly.

@jorge-ortega
Copy link
Collaborator Author

jorge-ortega commented Apr 18, 2025

Haven't needed any workarounds for 12.8 as of yet. I've been using LLVM 7.1.0, but unsure if that would make a difference. If you could share the IR, it could help further. I think all you need to do is call final_module_path on your cuda builder with a path to where the IR should go.

I also have a patch I've been sitting on that updates the data layout. I can push that to a branch if you'd like to try that.

@brandonros
Copy link

brandonros commented Apr 19, 2025

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
$ cat Cargo.toml
[package]
name = "ed25519_vanity"

As per https://github.com/Rust-GPU/Rust-CUDA/blob/290d711507d26f3ca16a484ba02c8a03ba9d7b7c/guide/src/nvvm/technical/debugging.md#miscompilations

Use RUSTFLAGS="--emit=llvm-ir" and find crate_name.ll in target/nvptx64-nvidia-cuda/<debug/release>/deps/ and attach it in any bug report.

RUSTFLAGS="--emit=llvm-ir" cargo build

find target -type f | grep ".ll"

I do not have anything other than target/debug/build/ed25519_vanity-64976b22a8e88420/build_script_build-64976b22a8e88420.ll that looks close

Edit:

I think all you need to do is call final_module_path on your cuda builder with a path to where the IR should go.

Got it.

use std::env;
use std::path;

use cuda_builder::CudaBuilder;

fn main() {
    println!("cargo:rerun-if-changed=build.rs");
    println!("cargo:rerun-if-changed=kernels");

    let out_path = path::PathBuf::from(env::var("OUT_DIR").unwrap());
    let builder = CudaBuilder::new("kernels")
        .copy_to(out_path.join("kernels.ptx"))
        .final_module_path(out_path.join("kernels.ll"))
        .build()
        .unwrap();
}

generates target/debug/build/ed25519_vanity-f0fe2ffa5f8b0bad/out/kernels.ll

vecadd example:

use cuda_std::prelude::*;

#[kernel]
#[allow(improper_ctypes_definitions, clippy::missing_safety_doc)]
pub unsafe fn vecadd(a: &[f32], b: &[f32], c: *mut f32) {
    let idx = thread::index_1d() as usize;
    if idx < a.len() {
        let elem = unsafe { &mut *c.add(idx) };
        *elem = a[idx] + b[idx];
    }
}
; ModuleID = 'merged_modules'
source_filename = "merged_modules"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

; Function Attrs: nounwind
define void @vecadd([0 x float]* nocapture readonly, i64, [0 x float]* nocapture readonly, i64, float* nocapture) unnamed_addr #0 {
  %6 = tail call i32 @__nvvm_thread_idx_x() #0
  %7 = icmp ult i32 %6, 1025
  tail call void @llvm.assume(i1 %7) #0
  %8 = tail call i32 @__nvvm_block_idx_x() #0
  %9 = icmp sgt i32 %8, -1
  tail call void @llvm.assume(i1 %9) #0
  %10 = tail call i32 @__nvvm_block_dim_x() #0
  %11 = icmp ne i32 %10, 0
  tail call void @llvm.assume(i1 %11) #0
  %12 = icmp ult i32 %10, 1026
  tail call void @llvm.assume(i1 %12) #0
  %13 = mul i32 %10, %8
  %14 = add i32 %13, %6
  %15 = zext i32 %14 to i64
  %16 = icmp ult i64 %15, %1
  br i1 %16, label %18, label %17

; <label>:17:                                     ; preds = %20, %5
  ret void

; <label>:18:                                     ; preds = %5
  %19 = icmp ult i64 %15, %3
  br i1 %19, label %20, label %27

; <label>:20:                                     ; preds = %18
  %21 = getelementptr inbounds [0 x float], [0 x float]* %0, i64 0, i64 %15
  %22 = load float, float* %21, align 4
  %23 = getelementptr inbounds float, float* %4, i64 %15
  %24 = getelementptr inbounds [0 x float], [0 x float]* %2, i64 0, i64 %15
  %25 = load float, float* %24, align 4
  %26 = fadd float %22, %25
  store float %26, float* %23, align 4
  br label %17

; <label>:27:                                     ; preds = %18
  tail call void @llvm.trap() #0
  unreachable
}

; Function Attrs: nounwind
declare i32 @__nvvm_thread_idx_x() unnamed_addr #0

; Function Attrs: nounwind
declare void @llvm.assume(i1) #0

; Function Attrs: nounwind
declare i32 @__nvvm_block_idx_x() unnamed_addr #0

; Function Attrs: nounwind
declare i32 @__nvvm_block_dim_x() unnamed_addr #0

; Function Attrs: noreturn nounwind
declare void @llvm.trap() #1

attributes #0 = { nounwind }
attributes #1 = { noreturn nounwind }

!nvvm.annotations = !{!0}
!nvvmir.version = !{!1}

!0 = !{void ([0 x float]*, i64, [0 x float]*, i64, float*)* @vecadd, !"kernel", i32 1}
!1 = !{i32 2, i32 0, i32 3, i32 1}

Edit 2: Confirmed not an issue / "fixed" by not using 12.0 but using 12.8 instead even with llvm 7.0.1

apt update
apt-get remove --purge 'cuda-12-0*' 'cuda-*12-0*' 'libcublas-12-0' 'libcublas-dev-12-0' cuda-12.0
apt install pkg-config libssl-dev llvm-7 llvm-7-dev llvm-7-tools clang-7 zlib1g-dev cuda-12-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants