Skip to content

UCX ignores UCX_ERROR_SIGNALS set by MPI.jl #409

@vchuravy

Description

@vchuravy

This manifests itself as a warning that the UCX_ERROR_SIGNALS variable is unused and leads to spurious aborts due to Julia's use of SIGSEV

[1595151936.606713] [node0022:65473:0]         parser.c:1491 UCX  WARN  unused env variable: UCX_ERROR_SIGNALS (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning)
...
node0022:65475:1:65488] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x3)
==== backtrace ====
    0  /home/software/spack/ucx/1.6.0-rxquvh64m7gt2oivvh4drm2rlquf4lf7/lib/libucs.so.0(+0x260f0) [0x20006df560f0]
    1  /home/software/spack/ucx/1.6.0-rxquvh64m7gt2oivvh4drm2rlquf4lf7/lib/libucs.so.0(+0x26520) [0x20006df56520]
    2  [0x2000000504d8]
    3  [0x2000ff2b4400]
    4  /home/software/julia/1.3.0/bin/../lib/libjulia.so.1(+0x211fc8) [0x200000281fc8]
    5  /home/software/julia/1.3.0/bin/../lib/libjulia.so.1(+0xc53e8) [0x2000001353e8]
    6  /home/software/julia/1.3.0/bin/../lib/libjulia.so.1(+0xc5974) [0x200000135974]

This is on MPI.jl 1.4.0 with OpenMPI 3.1.4 + UCX + pmi2
I suspect that the use of pmi2 is causing this since I can set export UCX_ERROR_SIGNALS="SIGILL,SIGBUS,SIGFPE",
before I do srun --mpi=pmi2 julia, and I do not get an unused env variable error nor spurious seqfaults.
`

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions