Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray start fails in Ubuntu 24 for version 2.41.0 #49974

Closed
gongomgra opened this issue Jan 20, 2025 · 2 comments
Closed

Ray start fails in Ubuntu 24 for version 2.41.0 #49974

gongomgra opened this issue Jan 20, 2025 · 2 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core external triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@gongomgra
Copy link

What happened + What you expected to happen

ray start command fails on Ubuntu 24.04 because the dashboard component can't be started. Checking the _private/services.py file it looks like dashboard_url is None. I have built Ray version 2.41.0 from source. It works flawlessly for other distributions like Ubuntu 22 and Ubuntu 20.

root@kuberay-cluster-head-l5jcs:/app# ulimit -n 65536; ray start --verbose --head  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1  --memory=1610612736  --dashboard-host=0.0.0.0 --disable-usage-stats
Usage stats collection is disabled.

Local node IP: 10.180.0.22
2025-01-20 17:00:55,095 DEBUG node.py:293 -- Setting node ID to d63e3a77793e9514783e8fb062299dfeed69f18f26b2081cc64f12fb
2025-01-20 17:00:55,099 DEBUG node.py:1409 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2025-01-20_17-00-55_093388_360/logs.
2025-01-20 17:00:56,068 ERROR services.py:1353 -- Failed to start the dashboard , return code -11
2025-01-20 17:00:56,068 ERROR services.py:1378 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#logging-directory-structure' to find where the log file is.
2025-01-20 17:00:56,069 ERROR services.py:1388 -- Couldn't read dashboard.log file. Error: [Errno 2] No such file or directory: '/tmp/ray/session_2025-01-20_17-00-55_093388_360/logs/dashboard.log'. It means the dashboard is broken even before it initializes the logger (mostly dependency issues). Reading the dashboard.err file which contains stdout/stderr.
2025-01-20 17:00:56,069 ERROR services.py:1422 -- Failed to read dashboard.err file: cannot mmap an empty file. It is unexpected. Please report an issue to Ray github. https://github.com/ray-project/ray/issues
2025-01-20 17:00:56,069 DEBUG node.py:1438 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2025-01-20_17-00-55_093388_360/logs.
2025-01-20 17:00:56,071 DEBUG tpu.py:115 -- Failed to detect number of TPUs: [Errno 2] No such file or directory: '/dev/vfio'
2025-01-20 17:00:56,072 DEBUG npu.py:60 -- Could not import AscendCL: No module named 'acl'
2025-01-20 17:00:56,086 DEBUG services.py:2131 -- Determine to start the Plasma object store with 0.44 GB memory using /dev/shm.

--------------------
Ray runtime started.
--------------------

Next steps
  To add another node to this Ray cluster, run
    ray start --address='10.180.0.22:6379'

  To connect to this Ray cluster:
    import ray
    ray.init()

  To terminate the Ray runtime, run
    ray stop

  To view the status of the cluster, use
    ray status

--block
  This command will now block forever until terminated by a signal.
  Running subprocesses are monitored and a message will be printed if any of them terminate unexpectedly. Subprocesses exit with SIGTERM will be treated as graceful, thus NOT reported.

Some Ray subprocesses exited unexpectedly:
  monitor [exit code=-11]
  ray_client_server [exit code=-11]
  log_monitor [exit code=-11]

Remaining processes will be killed.

I have found references to Ubuntu 22 and Ubuntu 20 both in the repository code and your docs website, so I wonder if Ubuntu 24 is officially supported or not. Can you confirm? And in case it is, can you also help to debug the initialization issue?

Versions / Dependencies

  • OS
root@kuberay-cluster-head-l5jcs:/app# cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
  • python --version
root@kuberay-cluster-head-l5jcs:/app# python --version
Python 3.12.8
  • pip list
root@kuberay-cluster-head-l5jcs:/app# pip list
Package                                  Version
---------------------------------------- -----------
aiohappyeyeballs                         2.4.4
aiohttp                                  3.11.11
aiohttp-cors                             0.7.0
aiorwlock                                1.5.0
aiosignal                                1.3.2
annotated-types                          0.7.0
anyio                                    4.8.0
async-timeout                            5.0.1
attrs                                    24.3.0
cachetools                               5.5.0
certifi                                  2024.12.14
cffi                                     1.17.1
charset-normalizer                       3.4.1
click                                    8.1.8
cloudpickle                              3.1.1
colorful                                 0.5.6
cryptography                             44.0.0
cupy-cuda12x                             13.3.0
Deprecated                               1.2.15
distlib                                  0.3.9
dm-tree                                  0.1.8
Farama-Notifications                     0.0.4
fastapi                                  0.115.6
fastrlock                                0.8.3
filelock                                 3.16.1
frozenlist                               1.5.0
fsspec                                   2024.12.0
google-api-core                          2.24.0
google-auth                              2.37.0
googleapis-common-protos                 1.66.0
grpcio                                   1.69.0
gymnasium                                1.0.0
h11                                      0.14.0
idna                                     3.10
imageio                                  2.36.1
importlib_metadata                       8.5.0
Jinja2                                   3.1.5
jsonschema                               4.23.0
jsonschema-specifications                2024.10.1
lazy_loader                              0.4
linkify-it-py                            2.0.3
lz4                                      4.3.3
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
mdit-py-plugins                          0.4.2
mdurl                                    0.1.2
memray                                   1.15.0
msgpack                                  1.1.0
multidict                                6.1.0
networkx                                 3.4.2
numpy                                    2.2.2
opencensus                               0.11.4
opencensus-context                       0.1.3
opentelemetry-api                        1.29.0
opentelemetry-exporter-otlp              1.29.0
opentelemetry-exporter-otlp-proto-common 1.29.0
opentelemetry-exporter-otlp-proto-grpc   1.29.0
opentelemetry-exporter-otlp-proto-http   1.29.0
opentelemetry-proto                      1.29.0
opentelemetry-sdk                        1.29.0
opentelemetry-semantic-conventions       0.50b0
packaging                                24.2
pandas                                   2.2.3
pillow                                   11.1.0
pip                                      24.3.1
pip                                      23.3.2
platformdirs                             4.3.6
prometheus_client                        0.21.1
propcache                                0.2.1
proto-plus                               1.25.0
protobuf                                 5.29.3
py-spy                                   0.4.0
pyarrow                                  19.0.0
pyasn1                                   0.6.1
pyasn1_modules                           0.4.1
pycparser                                2.22
pydantic                                 2.10.5
pydantic_core                            2.27.2
Pygments                                 2.19.1
pyOpenSSL                                25.0.0
python-dateutil                          2.9.0.post0
pytz                                     2024.2
PyYAML                                   6.0.2
ray                                      2.41.0
ray-cpp                                  2.41.0
referencing                              0.36.1
requests                                 2.32.3
rich                                     13.9.4
rpds-py                                  0.22.3
rsa                                      4.9
scikit-image                             0.25.0
scipy                                    1.15.1
setuptools                               70.3.0
shellingham                              1.5.4
six                                      1.17.0
smart-open                               7.1.0
sniffio                                  1.3.1
starlette                                0.41.3
tensorboardX                             2.6.2.2
textual                                  1.0.0
tifffile                                 2025.1.10
typer                                    0.15.1
typing_extensions                        4.12.2
tzdata                                   2024.2
uc-micro-py                              1.0.3
urllib3                                  2.3.0
uvicorn                                  0.34.0
virtualenv                               20.29.1
watchfiles                               1.0.4
wrapt                                    1.17.2
yarl                                     1.18.3
zipp                                     3.21.0
  • ray --version
root@kuberay-cluster-head-l5jcs:/app# ray --version
ray, version 2.41.0

Reproduction script

$ ulimit -n 65536; ray start --verbose --head  --metrics-export-port=8080  --block  --dashboard-agent-listen-port=52365  --num-cpus=1  --memory=1610612736  --dashboard-host=0.0.0.0 --disable-usage-stats

Issue Severity

High: It blocks me from completing my task.

@gongomgra gongomgra added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 20, 2025
@jjyao jjyao added the core Issues that should be addressed in Ray Core label Jan 22, 2025
@MengjinYan
Copy link
Collaborator

Hi @gongomgra, thanks for reporting the issue!

I tried the repro you provided. Unfortunately, I was not to repro your issue. Probably you can check the following environment that I tried with the repro to see if you can find some insights.

Below are the environment I ran the repro:

$ray --version
ray, version 2.41.0
$ python --version
Python 3.12.8
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
$pip list
Package                                  Version
---------------------------------------- -----------
aiohappyeyeballs                         2.4.4
aiohttp                                  3.11.11
aiohttp-cors                             0.7.0
aiosignal                                1.3.2
annotated-types                          0.7.0
anyio                                    4.8.0
attrs                                    24.3.0
cachetools                               5.5.1
certifi                                  2024.12.14
cffi                                     1.17.1
charset-normalizer                       3.4.1
click                                    8.1.8
cloudpickle                              3.1.1
colorful                                 0.5.6
cryptography                             44.0.0
cupy-cuda12x                             13.3.0
Deprecated                               1.2.15
distlib                                  0.3.9
dm-tree                                  0.1.8
Farama-Notifications                     0.0.4
fastapi                                  0.115.7
fastrlock                                0.8.3
filelock                                 3.17.0
frozenlist                               1.5.0
fsspec                                   2024.12.0
google-api-core                          2.24.0
google-auth                              2.38.0
googleapis-common-protos                 1.66.0
grpcio                                   1.70.0
gymnasium                                1.0.0
h11                                      0.14.0
httptools                                0.6.4
idna                                     3.10
importlib_metadata                       8.5.0
Jinja2                                   3.1.5
jsonschema                               4.23.0
jsonschema-specifications                2024.10.1
linkify-it-py                            2.0.3
lz4                                      4.3.3
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
mdit-py-plugins                          0.4.2
mdurl                                    0.1.2
memray                                   1.15.0
msgpack                                  1.1.0
multidict                                6.1.0
numpy                                    2.2.2
opencensus                               0.11.4
opencensus-context                       0.1.3
opentelemetry-api                        1.29.0
opentelemetry-exporter-otlp              1.29.0
opentelemetry-exporter-otlp-proto-common 1.29.0
opentelemetry-exporter-otlp-proto-grpc   1.29.0
opentelemetry-exporter-otlp-proto-http   1.29.0
opentelemetry-proto                      1.29.0
opentelemetry-sdk                        1.29.0
opentelemetry-semantic-conventions       0.50b0
ormsgpack                                1.7.0
packaging                                24.2
pandas                                   2.2.3
pip                                      24.3.1
platformdirs                             4.3.6
prometheus_client                        0.21.1
propcache                                0.2.1
proto-plus                               1.25.0
protobuf                                 5.29.3
py-spy                                   0.4.0
pyarrow                                  19.0.0
pyasn1                                   0.6.1
pyasn1_modules                           0.4.1
pycparser                                2.22
pydantic                                 2.10.5
pydantic_core                            2.27.2
Pygments                                 2.19.1
pyOpenSSL                                25.0.0
python-dateutil                          2.9.0.post0
python-dotenv                            1.0.1
pytz                                     2024.2
PyYAML                                   6.0.2
ray                                      2.41.0
referencing                              0.36.1
requests                                 2.32.3
rich                                     13.9.4
rpds-py                                  0.22.3
rsa                                      4.9
scipy                                    1.15.1
setuptools                               75.8.0
six                                      1.17.0
smart-open                               7.1.0
sniffio                                  1.3.1
starlette                                0.45.2
tensorboardX                             2.6.2.2
textual                                  1.0.0
typing_extensions                        4.12.2
tzdata                                   2025.1
uc-micro-py                              1.0.3
urllib3                                  2.3.0
uvicorn                                  0.34.0
uvloop                                   0.21.0
virtualenv                               20.29.1
watchfiles                               1.0.4
websockets                               14.2
wheel                                    0.45.1
wrapt                                    1.17.2
yarl                                     1.18.3
zipp                                     3.21.0

@gongomgra
Copy link
Author

Hi @MengjinYan , thank you for confirming it works in Ubuntu 24. We have found the issue on our side and have been able to fix it by building Ray with a different version of clang (clang 18 from system repos). I'm closing this ticket as solved. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core external triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

3 participants