Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

Closed
JIBSIL opened this issue Nov 28, 2024 · 4 comments
Closed

RD-Agent fin_model fails on 1-2 GPU systems (w/ fix) #499

JIBSIL opened this issue Nov 28, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@JIBSIL
Copy link

JIBSIL commented Nov 28, 2024

🐛 Bug Description

The default CUDA device ordinal is incorrect for RD-Agent model generation

To Reproduce

Steps to reproduce the behavior:

  1. Use a system that is GPU-enabled and has fewer than three GPUs
  2. Run rdagent fin_model
  3. You will get an invalid device ordinal error

Expected Behavior

It should be expected that the user has at least one GPU device, not three minimum. Therefore the default GPU should be 0.

Screenshot

Image
Image

Environment

Note: Users can run rdagent collect_info to get system information and paste it directly here.

(rdagent) E:\RD-Agent>rdagent collect_info
2024-11-28 15:48:52.150 | WARNING  | rdagent.oai.llm_utils:<module>:48 - llama is not installed.
2024-11-28 15:48:57.781 | INFO     | rdagent.app.utils.info:sys_info:22 - Name of current operating system: Windows
2024-11-28 15:48:57.789 | INFO     | rdagent.app.utils.info:sys_info:22 - Processor architecture: AMD64
2024-11-28 15:48:57.796 | INFO     | rdagent.app.utils.info:sys_info:22 - System, version, and hardware information: Windows-10-10.0.19045-SP0
2024-11-28 15:48:57.807 | INFO     | rdagent.app.utils.info:sys_info:22 - Version number of the system: 10.0.19045
2024-11-28 15:48:57.814 | INFO     | rdagent.app.utils.info:python_info:29 - Python version: 3.10.15 | packaged by Anaconda, Inc. | (main, Oct  3 2024, 07:22:19) [MSC v.1929 64 bit (AMD64)]
2024-11-28 15:48:58.056 | INFO     | rdagent.app.utils.info:docker_info:39 - Container ID: a33c4ca045838432aa7a8e0a299edbf8cc8d69b4848699412598bf4e3238b23f
2024-11-28 15:48:58.065 | INFO     | rdagent.app.utils.info:docker_info:40 - Container Name: kind_tharp
2024-11-28 15:48:58.085 | INFO     | rdagent.app.utils.info:docker_info:41 - Container Status: exited
2024-11-28 15:48:58.108 | INFO     | rdagent.app.utils.info:docker_info:42 - Image ID used by the container: sha256:f219da361b6969fea8ea5c3c8040db88ca8164bc7629def2f53c4197ca0ff2b9
2024-11-28 15:48:58.164 | INFO     | rdagent.app.utils.info:docker_info:43 - Image tag used by the container: ['local_qlib:latest']
2024-11-28 15:48:58.174 | INFO     | rdagent.app.utils.info:docker_info:44 - Container port mapping: {}
2024-11-28 15:48:58.186 | INFO     | rdagent.app.utils.info:docker_info:45 - Container Label: {'com.nvidia.volumes.needed': 'nvidia_driver', 'org.opencontainers.image.ref.name': 'ubuntu', 'org.opencontainers.image.version': '22.04'}
2024-11-28 15:48:58.449 | INFO     | rdagent.app.utils.info:docker_info:46 - Startup Commands: nvidia-smi
2024-11-28 15:48:58.591 | INFO     | rdagent.app.utils.info:rdagent_info:54 - RD-Agent version: 0.3.0
2024-11-28 15:48:59.777 | INFO     | rdagent.app.utils.info:rdagent_info:76 - Package version: ['pydantic-settings==2.6.1', 'python-Levenshtein==0.26.1', 'scikit-learn==1.5.2', 'filelock==3.16.1', 'loguru==0.7.2', 'fire==0.7.0', 'fuzzywuzzy==0.18.0', 'openai==1.55.3', 'numpy==1.26.4', 'pandas==2.2.3', 'pandarallel==1.6.5', 'matplotlib==3.9.2', 'langchain==0.3.9', 'langchain-community==0.3.8', 'tiktoken==0.8.0', 'pymupdf==1.24.14', 'pypdf==5.1.0', 'azure-ai-formrecognizer==3.3.3', 'tables==3.10.1', 'tree-sitter-python==0.23.4', 'tree-sitter==0.23.2', 'python-dotenv==1.0.1', 'docker==7.1.0', 'streamlit==1.40.2', 'plotly==5.24.1', 'st-theme==1.2.3', 'selenium==4.27.1', 'kaggle==1.6.17', 'nbformat==5.10.4', 'seaborn==0.13.2', 'setuptools-scm==8.1.0']

Additional Notes

Referenced issues: #442 #445

Fix

Image

  1. Browse to git_ignore_folder/
  2. Find the folder with conf.yaml in it
  3. Change the 2 to 0 on line 65

Referenced code: https://github.com/microsoft/RD-Agent/blob/main/rdagent/scenarios/qlib/experiment/model_template/conf.yaml#L65

I am reluctant to make a one-line pull request, but if you are a contributor please clarify if this is expected behaviour from RD-Agent.

@JIBSIL JIBSIL added the bug Something isn't working label Nov 28, 2024
@SunsetWolf
Copy link
Collaborator

We are very glad to receive your issue, I found the following in the information you provided. You are running RD-Agent on Windows, but RD-Agent is not currently supported on Windows. There is a badge in the README called platform that explains this, so we recommend that you run RD-Agent on Linux.

@JIBSIL
Copy link
Author

JIBSIL commented Nov 29, 2024

We are very glad to receive your issue, I found the following in the information you provided. You are running RD-Agent on Windows, but RD-Agent is not currently supported on Windows. There is a badge in the README called platform that explains this, so we recommend that you run RD-Agent on Linux.

Hi,
I'm running the Dockerized (WSL2/Docker Desktop) version of RD-Agent though; the environment that the python code is running in should be a linux based one

@TPLin22
Copy link
Collaborator

TPLin22 commented Dec 4, 2024

Hi,
Thank you for pointing out the correct change. The default number of GPUs should indeed be set to 0. You can try making the modification on a branch and submit a pull request again, or I can address this bug later.

Thank you for your contribution!

@TPLin22
Copy link
Collaborator

TPLin22 commented Dec 10, 2024

it is fixed in PR #503

@TPLin22 TPLin22 closed this as completed Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants