-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added OPENCL_VISIBLE_DEVICES support #45
base: main
Are you sure you want to change the base?
Conversation
e.g., OPENCL_VISIBLE_DEVICES=foo.icd:0:gpu:0,1::bar.icd:1:cpu:0 only shows - GPU 0, 1 in the first platform of foo.icd - CPU 0 in the second platform of bar.icd
e.g., OPENCL_VISIBLE_DEVICES=foo.icd:0:gpu:0,1::bar.icd:1:cpu:0 only shows - GPU 0, 1 in the first platform of foo.icd - CPU 0 in the second platform of bar.icd
The issues I have with this pull-request are:
I understand the purpose it serves and how it can seem appealing in different contexts:
I am unsure the second point is correctly addressed with these kind of mechanisms, as system based mechanisms for resource arbitration seem more robust. And you don't address sharing at the sub-device level. Brice |
Maybe appending an optional trailing list of accepted (or rejected) devices to platforms would be easier to maintain: Example, if vendor A has 3 devices and only devices with index 1 and 2 are wanted, discarding 0: /etc/OpenCL/vendors/VendorA.icd: OCL_ICD_FILENAMES=libVendorAOpenCL.so,1,2:libVendorB.so [HKLM\SOFTWARE\Khronos\OpenCL\Vendors] I know this idea raises questions (OCL_ICD_FILENAMES is not available on windows), but it seems conceptually simpler to me. Implementation wise, it is similar to the current pull request, as it requires maintaining a list of valid devices for each platform (or only for restricted platforms). Platform without devices left could or maybe should be excluded. Brice |
First off - I think this is an interesting capability and we have seen requests for something like it, so thank you @manycoresoft for doing the work to get this started. My comments are more about the implementation and less about the syntax used for device filtering: the filtering appears to be implemented differently for devices and for platforms. For platforms, the filtering occurs when the ICD loader interrogates the system to discover the installed platforms, whereas for devices the filtering occurs as part of the implementation of clGetDeviceIDs. I think I would prefer to do the filtering similarly in both cases, which in practice means that the filtering for platforms would move to the implementation of clGetPlatformIDs. This separates the platform filtering code from the code to discover and add platforms (which is growing to be quite complex). It also makes it very easy to disable (or even to compile out) the platform and device filtering code, if desired. |
…m CUDA_VISIBLE_DEVICES. OPENCL_VISIBLE_DEVICES=descriptor_list descriptor_list -> descriptor | descriptor::descriptor_list descriptor -> driver,platform | driver,platform,type | driver,platform,type,device_list device_list -> device_id | device_id,device_list driver: client driver (foo.icd or foo.so) platform: platform ID type: gpu, cpu, accelerator, custom, or any device_id: device ID in the platform This patch is to land KhronosGroup#45 again. Most of the code is from pull request 45. e.g., OPENCL_VISIBLE_DEVICES=foo.icd,0,gpu,0,1::bar.icd,1,cpu,0 only shows - GPU 0, 1 in the first platform of foo.icd - CPU 0 in the second platform of bar.icd
…m CUDA_VISIBLE_DEVICES. OPENCL_VISIBLE_DEVICES=descriptor_list descriptor_list -> descriptor | descriptor::descriptor_list descriptor -> driver,platform | driver,platform,type | driver,platform,type,device_list device_list -> device_id | device_id,device_list driver: client driver (foo.icd or foo.so) platform: platform ID type: gpu, cpu, accelerator, custom, or any device_id: device ID in the platform This patch is to land KhronosGroup#45 again. Most of the code is from pull request 45. e.g., OPENCL_VISIBLE_DEVICES=foo.icd,0,gpu,0,1::bar.icd,1,cpu,0 only shows - GPU 0, 1 in the first platform of foo.icd - CPU 0 in the second platform of bar.icd
…m CUDA_VISIBLE_DEVICES. OPENCL_VISIBLE_DEVICES=descriptor_list descriptor_list -> descriptor | descriptor::descriptor_list descriptor -> driver,platform | driver,platform,type | driver,platform,type,device_list device_list -> device_id | device_id,device_list driver: client driver (foo.icd or foo.so) platform: platform ID type: gpu, cpu, accelerator, custom, or any device_id: device ID in the platform This patch is to land KhronosGroup#45 again. Most of the code is from pull request 45. e.g., OPENCL_VISIBLE_DEVICES=foo.icd,0,gpu,0,1::bar.icd,1,cpu,0 only shows - GPU 0, 1 in the first platform of foo.icd - CPU 0 in the second platform of bar.icd
…m CUDA_VISIBLE_DEVICES. OPENCL_VISIBLE_DEVICES=descriptor_list descriptor_list -> descriptor | descriptor::descriptor_list descriptor -> driver,platform | driver,platform,type | driver,platform,type,device_list device_list -> device_id | device_id,device_list driver: client driver (foo.icd or foo.so) platform: platform ID type: gpu, cpu, accelerator, custom, or any device_id: device ID in the platform This patch is to land KhronosGroup#45 again. Most of the code is from pull request 45. e.g., OPENCL_VISIBLE_DEVICES=foo.icd,0,gpu,0,1::bar.icd,1,cpu,0 only shows - GPU 0, 1 in the first platform of foo.icd - CPU 0 in the second platform of bar.icd
Supported an OPENCL_VISIBLE_DEVICES environment variable inspired from CUDA_VISIBLE_DEVICES.
OPENCL_VISIBLE_DEVICES=descriptor_list
descriptor_list -> descriptor | descriptor::descriptor_list
descriptor -> driver:platform | driver:platform:type | driver:platform:type:device_list
device_list -> device_id | device_id,device_list
driver: client driver (foo.icd or foo.so)
platform: platform ID
type: gpu, cpu, accelerator, custom, or any
device_id: device ID in the platform
e.g., OPENCL_VISIBLE_DEVICES=foo.icd:0:gpu:0,1::bar.icd:1:cpu:0 only shows
- GPU 0, 1 in the first platform of foo.icd
- CPU 0 in the second platform of bar.icd