Add OpenACC/OMP debugging info to docs

I did some of my own hunting, but i’m sure it’s incomplete. it would be very useful to add this to your OpenACC/MP docs @prathi-wind  even if incomplete.


Cray+OpenACC
(I think this is supposed to work with OMP as well)
Links: https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables

```
CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy) Dumps a time-stamped log line ("ACC: …) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU."
```

set via `export CRAY_ACC_DEBUG=3` before running.

I also found this Cray one, which we haven’t used before but seems potentially useful:

```
CRAY_ACC_FORCE_EARLY_INIT=1
Force full GPU initialisation at program start so you can see start-up hangs immediately  
```


If there is a problem with data movement, apparently, this helps:
```
export CRAY_ACC_DEBUG=3
export CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1
* sprinkle acc_present_dump() or omp_get_mapped_ptr() around hotspots
```

Makes `acc_present_dump()` show names + source lines—priceless for "present but not really" bugs

Vendor/compiler agnostic OpenMP:

```
export OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT
Quick way to turn off off-load (DISABLED) or make it abort if a GPU isn't found (MANDATORY)—great first test: does the problem disappear when you drop back to the CPU? 
```

Cray+OMP: https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables

NVIDIA compilers:

ACC only I think (links: https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables)

```
NVCOMPILER_ACC_NOTIFY
Sends a one-liner to stderr every time something interesting happens.  
Bit-mask:1 = kernel launch   2 = data copies   4 = region entry/exit   8 = wait/sync   16 = malloc/free

1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?" 
```

```
NVCOMPILER_ACC_TIME
Lightweight profiler; prints a tidy end-of-run table with per-region and per-kernel times and bytes moved.

Set to any non-zero value (most folks just use 1). Don't run CUDA profilers at the same time.  
```

```
NVCOMPILER_ACC_DEBUG=1
Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.  Great for "partially present" or "pointer went missing" errors. 
```


NVIDIA+OMP: (links: https://openmp.llvm.org/design/Runtimes.html)

I think these might work with Cray + OMP, but I’m not sure, since it targets the underlying llvm of OMP.

You can also apparently profile OMP at runtime without invoking a proper profiler, similar to `NVCOMPILER_ACC_TIME`. This might work with cray as well.

```
export LIBOMPTARGET_PROFILE=run.json
# then inspect the output json file via Chrome.
which has this detail:
Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope; great lightweight profiler when Nsight is over-kill. Granularity in µs via LIBOMPTARGET_PROFILE_GRANULARITY (default 500).
```

there is also

```
LIBOMPTARGET_INFO
bit-mask, e.g. 1 (= print kernel args) 0x10 (= plugin info) -1 (= everything)

Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits. Perfect first stop for "why is nothing copied?" 
```

```
LIBOMPTARGET_DEBUG=1
Developer-level trace (host-side). Much noisier than INFO; only works if the runtime was built with -DOMPTARGET_DEBUG.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenACC/OMP debugging info to docs #918

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add OpenACC/OMP debugging info to docs #918

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions