Skip to content

Align UR_API fields (8 byte) for optimization create/move/copy structs on x64 cpus #2747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GermanAizek
Copy link

@GermanAizek GermanAizek commented Mar 27, 2025

@RossBrunton, @aarongreig,

Would you like to organize migration to aligned structures your Unified Runtime API for modern x64 processors? This should be guaranteed to lead to more frequent structs entry into CPU cache, which can greatly affect performance if aligned structures are used frequently. I hope that you (as Intel employees) know advantage this optimization method at the architectural level codebase and its inconveniences as stylistic ABI breakdown.

Very briefly, your API is not badly broken, I have changed only first and second field in all aligned structures.
(pNext and sType)

More info about technique:
https://stackoverflow.com/a/20882083
https://zijishi.xyz/post/optimization-technique/learning-to-use-data-alignment/
https://en.wikipedia.org/wiki/Data_structure_alignment

Affected API structures:

  • ur_image_desc_t 80 -> 72 bytes
  • ur_exp_command_buffer_update_value_arg_desc_t 48 -> 40 bytes
  • ur_exp_command_buffer_update_memobj_arg_desc_t 40 -> 32 bytes
  • ur_exp_command_buffer_update_pointer_arg_desc_t 40 -> 32 bytes
  • ur_sampler_desc_t 32 -> 24 bytes
  • ur_program_properties_t 32 -> 24 bytes
  • ur_exp_sampler_addr_modes_t 32 -> 24 bytes
  • ur_platform_native_properties_t 24 -> 16 bytes
  • ur_device_native_properties_t 24 -> 16 bytes
  • ur_context_properties_t 24 -> 16 bytes
  • ur_context_native_properties_t 24 -> 16 bytes
  • ur_buffer_channel_properties_t 24 -> 16 bytes
  • ur_buffer_alloc_location_properties_t 24 -> 16 bytes
  • ur_mem_native_properties_t 24 -> 16 bytes
  • ur_sampler_native_properties_t 24 -> 16 bytes
  • ur_usm_host_desc_t 24 -> 16 bytes
  • ur_usm_device_desc_t 24 -> 16 bytes
  • ur_usm_alloc_location_desc_t 24 -> 16 bytes
  • ur_usm_pool_desc_t 24 -> 16 bytes
  • ur_physical_mem_properties_t 24 -> 16 bytes
  • ur_program_native_properties_t 24 -> 16 bytes
  • ur_kernel_arg_mem_obj_properties_t 24 -> 16 bytes
  • ur_kernel_native_properties_t 24 -> 16 bytes
  • ur_queue_properties_t 24 -> 16 bytes
  • ur_queue_index_properties_t 24 -> 16 bytes
  • ur_queue_native_properties_t 24 -> 16 bytes
  • ur_event_native_properties_t 24 -> 16 bytes
  • ur_exp_async_usm_alloc_properties_t 24 -> 16 bytes
  • ur_exp_file_descriptor_t 24 -> 16 bytes
  • ur_exp_sampler_cubemap_properties_t 24 -> 16 bytes
  • ur_exp_command_buffer_desc_t 24 -> 16 bytes
  • ur_exp_enqueue_ext_properties_t 24 -> 16 bytes
  • ur_exp_enqueue_native_command_properties_t 24 -> 16 bytes

…s on x64 cpus

Affected API structures:
- ur_image_desc_t 80 -> 72 bytes
- ur_exp_command_buffer_update_value_arg_desc_t 48 -> 40 bytes
- ur_exp_command_buffer_update_memobj_arg_desc_t 40 -> 32 bytes
- ur_exp_command_buffer_update_pointer_arg_desc_t 40 -> 32 bytes
- ur_sampler_desc_t 32 -> 24 bytes
- ur_program_properties_t 32 -> 24 bytes
- ur_exp_sampler_addr_modes_t 32 -> 24 bytes
- ur_platform_native_properties_t 24 -> 16 bytes
- ur_device_native_properties_t 24 -> 16 bytes
- ur_context_properties_t 24 -> 16 bytes
- ur_context_native_properties_t 24 -> 16 bytes
- ur_buffer_channel_properties_t 24 -> 16 bytes
- ur_buffer_alloc_location_properties_t 24 -> 16 bytes
- ur_mem_native_properties_t 24 -> 16 bytes
- ur_sampler_native_properties_t 24 -> 16 bytes
- ur_usm_host_desc_t 24 -> 16 bytes
- ur_usm_device_desc_t 24 -> 16 bytes
- ur_usm_alloc_location_desc_t 24 -> 16 bytes
- ur_usm_pool_desc_t 24 -> 16 bytes
- ur_physical_mem_properties_t 24 -> 16 bytes
- ur_program_native_properties_t 24 -> 16 bytes
- ur_kernel_arg_mem_obj_properties_t 24 -> 16 bytes
- ur_kernel_native_properties_t 24 -> 16 bytes
- ur_queue_properties_t 24 -> 16 bytes
- ur_queue_index_properties_t 24 -> 16 bytes
- ur_queue_native_properties_t 24 -> 16 bytes
- ur_event_native_properties_t 24 -> 16 bytes
- ur_exp_async_usm_alloc_properties_t 24 -> 16 bytes
- ur_exp_file_descriptor_t 24 -> 16 bytes
- ur_exp_sampler_cubemap_properties_t 24 -> 16 bytes
- ur_exp_command_buffer_desc_t 24 -> 16 bytes
- ur_exp_enqueue_ext_properties_t 24 -> 16 bytes
- ur_exp_enqueue_native_command_properties_t 24 -> 16 bytes
@GermanAizek GermanAizek requested a review from a team as a code owner March 27, 2025 13:27
@github-actions github-actions bot added loader Loader related feature/bug conformance Conformance test suite issues. auto-close labels Mar 27, 2025
Copy link

Unified Runtime -> intel/llvm Repo Move Notice

Information

The source code of Unified Runtime has been moved to intel/llvm under the unified-runtime top-level directory,
all future development will now be carried out there. This was done in intel/llvm#17043.

The code will be mirrored to oneapi-src/unified-runtime and the specification will continue to be hosted at oneapi-src.github.io/unified-runtime.

The contribution guide will be updated with new instructions for contributing to Unified Runtime.

PR Migration

All open PRs including this one will be marked with the auto-close label and shall be automatically closed after 30 days.

Should you wish to continue with your PR you will need to migrate it to intel/llvm.
We have provided a script to help automate this process.

If your PR should remain open and not be closed automatically, you can remove the auto-close label.


This is an automated comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-close conformance Conformance test suite issues. loader Loader related feature/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant