Skip to content

Conversation

gsitaram
Copy link

@gsitaram gsitaram commented Oct 1, 2022

Hi @Luke20000429, @joydddd, you can use this simple standalone to measure the bandwidth achieved over PCIe and compare and contrast transfers of

  • 1 large buffer vs multiple small buffers
  • using pinned memory vs pageable memory
  • using hipMemcpy vs hipMemcpyAsync
    There is a convenient run script that you can use to tune your sweep over various parameter values.

My conclusions are the following:

  • The performance gets close to peak and is the same whether you transfer a large buffer of size 128MB or 16 small buffers of size 8MB.
  • Using pinned memory is better even for hipMemcpy
  • The performance of hipMemcpyAsync seems to be better even if we just transfer one time (i.e., iter=1)
  • Performance fluctuates when we test on the GPU in our workstation, it is more stable when testing a GPU on a server.

YMMV, so it is best to test on your end with the cards you have access to.

@ooreilly
Copy link

Is this code relevant for ksw2? I don't see any dependencies on ksw2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants