RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

JIBSIL · 2024-11-28T21:07:35Z

🐛 Bug Description

The default CUDA device ordinal is incorrect for RD-Agent model generation

To Reproduce

Steps to reproduce the behavior:

Use a system that is GPU-enabled and has fewer than three GPUs
Run rdagent fin_model
You will get an invalid device ordinal error

Expected Behavior

It should be expected that the user has at least one GPU device, not three minimum. Therefore the default GPU should be 0.

Screenshot

Environment

Note: Users can run rdagent collect_info to get system information and paste it directly here.

(rdagent) E:\RD-Agent>rdagent collect_info
2024-11-28 15:48:52.150 | WARNING  | rdagent.oai.llm_utils:<module>:48 - llama is not installed.
2024-11-28 15:48:57.781 | INFO     | rdagent.app.utils.info:sys_info:22 - Name of current operating system: Windows
2024-11-28 15:48:57.789 | INFO     | rdagent.app.utils.info:sys_info:22 - Processor architecture: AMD64
2024-11-28 15:48:57.796 | INFO     | rdagent.app.utils.info:sys_info:22 - System, version, and hardware information: Windows-10-10.0.19045-SP0
2024-11-28 15:48:57.807 | INFO     | rdagent.app.utils.info:sys_info:22 - Version number of the system: 10.0.19045
2024-11-28 15:48:57.814 | INFO     | rdagent.app.utils.info:python_info:29 - Python version: 3.10.15 | packaged by Anaconda, Inc. | (main, Oct  3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)]
2024-11-28 15:48:58.056 | INFO     | rdagent.app.utils.info:docker_info:39 - Container ID: a33c4ca045838432aa7a8e0a299edbf8cc8d69b4848699412598bf4e3238b23f
2024-11-28 15:48:58.065 | INFO     | rdagent.app.utils.info:docker_info:40 - Container Name: kind_tharp
2024-11-28 15:48:58.085 | INFO     | rdagent.app.utils.info:docker_info:41 - Container Status: exited
2024-11-28 15:48:58.108 | INFO     | rdagent.app.utils.info:docker_info:42 - Image ID used by the container: sha256:f219da361b6969fea8ea5c3c8040db88ca8164bc7629def2f53c4197ca0ff2b9
2024-11-28 15:48:58.164 | INFO     | rdagent.app.utils.info:docker_info:43 - Image tag used by the container: ['local_qlib:latest']
2024-11-28 15:48:58.174 | INFO     | rdagent.app.utils.info:docker_info:44 - Container port mapping: {}
2024-11-28 15:48:58.186 | INFO     | rdagent.app.utils.info:docker_info:45 - Container Label: {'com.nvidia.volumes.needed': 'nvidia_driver', 'org.opencontainers.image.ref.name': 'ubuntu', 'org.opencontainers.image.version': '22.04'}
2024-11-28 15:48:58.449 | INFO     | rdagent.app.utils.info:docker_info:46 - Startup Commands: nvidia-smi
2024-11-28 15:48:58.591 | INFO     | rdagent.app.utils.info:rdagent_info:54 - RD-Agent version: 0.3.0
2024-11-28 15:48:59.777 | INFO     | rdagent.app.utils.info:rdagent_info:76 - Package version: ['pydantic-settings==2.6.1', 'python-Levenshtein==0.26.1', 'scikit-learn==1.5.2', 'filelock==3.16.1', 'loguru==0.7.2', 'fire==0.7.0', 'fuzzywuzzy==0.18.0', 'openai==1.55.3', 'numpy==1.26.4', 'pandas==2.2.3', 'pandarallel==1.6.5', 'matplotlib==3.9.2', 'langchain==0.3.9', 'langchain-community==0.3.8', 'tiktoken==0.8.0', 'pymupdf==1.24.14', 'pypdf==5.1.0', 'azure-ai-formrecognizer==3.3.3', 'tables==3.10.1', 'tree-sitter-python==0.23.4', 'tree-sitter==0.23.2', 'python-dotenv==1.0.1', 'docker==7.1.0', 'streamlit==1.40.2', 'plotly==5.24.1', 'st-theme==1.2.3', 'selenium==4.27.1', 'kaggle==1.6.17', 'nbformat==5.10.4', 'seaborn==0.13.2', 'setuptools-scm==8.1.0']

Additional Notes

Referenced issues: #442 #445

Fix

Browse to git_ignore_folder/
Find the folder with conf.yaml in it
Change the 2 to 0 on line 65

Referenced code: https://github.com/microsoft/RD-Agent/blob/main/rdagent/scenarios/qlib/experiment/model_template/conf.yaml#L65

I am reluctant to make a one-line pull request, but if you are a contributor please clarify if this is expected behaviour from RD-Agent.

The text was updated successfully, but these errors were encountered:

SunsetWolf · 2024-11-29T03:36:35Z

We are very glad to receive your issue, I found the following in the information you provided. You are running RD-Agent on Windows, but RD-Agent is not currently supported on Windows. There is a badge in the README called platform that explains this, so we recommend that you run RD-Agent on Linux.

JIBSIL · 2024-11-29T17:44:10Z

We are very glad to receive your issue, I found the following in the information you provided. You are running RD-Agent on Windows, but RD-Agent is not currently supported on Windows. There is a badge in the README called platform that explains this, so we recommend that you run RD-Agent on Linux.

Hi,
I'm running the Dockerized (WSL2/Docker Desktop) version of RD-Agent though; the environment that the python code is running in should be a linux based one

TPLin22 · 2024-12-04T03:42:06Z

Hi,
Thank you for pointing out the correct change. The default number of GPUs should indeed be set to 0. You can try making the modification on a branch and submit a pull request again, or I can address this bug later.

Thank you for your contribution!

TPLin22 · 2024-12-10T10:08:56Z

it is fixed in PR #503

JIBSIL added the bug Something isn't working label Nov 28, 2024

TPLin22 closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

JIBSIL commented Nov 28, 2024

SunsetWolf commented Nov 29, 2024

JIBSIL commented Nov 29, 2024

TPLin22 commented Dec 4, 2024

TPLin22 commented Dec 10, 2024

RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

Comments

JIBSIL commented Nov 28, 2024

🐛 Bug Description

To Reproduce

Expected Behavior

Screenshot

Environment

Additional Notes

Fix

SunsetWolf commented Nov 29, 2024

JIBSIL commented Nov 29, 2024

TPLin22 commented Dec 4, 2024

TPLin22 commented Dec 10, 2024