Skip to content

SGE (via qrsh): addprocs_qrsh() fails on cluster that supports qrsh #141

@jtrakk

Description

@jtrakk

When I use addprocs_qrsh() I get an error message and no jobs are created (checked in qstat).

ClusterManagers.addprocs_qrsh(3,res_list="h_rt=2:00:00,h_data=4G,highp")
Error launching workers
MethodError(iterate, (Process(`qrsh -l h_rt=2:00:00,h_data=4G,highp -V -N julia-13730 -now n cd /mydir '&&' /u/local/apps/julia/1.5.1/bin/julia --worker=2BuUs4aIkAHENSDE`, ProcessRunning),), 0x0000000000006caf)
Int64[]

My cluster does support qrsh. When I try to run the qrsh command manually in a shell, it produces these messages about host key, but does seem to allocate the worker, as I can see it in qstat.

qrsh -l h_rt=2:00:00,h_data=4G,highp -V -N julia-13730 -now n cd /mydir '&&' /u/local/apps/julia/1.5.1/bin/julia --worker=2BuUs4aIkAHENSDE
could not open any host key
ssh_keysign: no reply
key_sign failed
julia_worker:9934

job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID 
------------------------------------------------------------------------------------------------------------------------------------------------
   4514401 0.50500 QRLOGIN    user         r     09/02/2020 00:05:14 my.q@nodexxx                                                  2        

When I use addprocs_sge() it works just fine.


This looks like the same issue as this comment but opened a new issue as that one was originally opened for a different purpose.

Julia 1.5.1
ClusterManagers.jl master branch dde400e

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions