Skip to content

Conversation

dmikushin
Copy link
Contributor

Summary

This PR fixes the incorrect Table size and accesses per thread reporting for shared memory tests in the GUPS benchmark, addressing the issues raised by @rkarim2 in #56.

Problem

Previously, when running shared memory tests, the benchmark incorrectly reported:

  • Table size showed the global memory working set size (e.g., 4.2 GB), which was irrelevant for shared memory tests
  • Number of accesses per thread used the global memory value instead of the shared memory-specific value

Solution

  • Table size for shared memory: Now correctly shows the actual total shared memory used across all blocks (grid * n_shmem * sizeof(benchtype))
  • Clear details: Added breakdown showing bytes per block × number of blocks for transparency
  • Correct accesses count: Uses accesses_per_elem_sh for shared memory tests instead of accesses_per_elem
  • Appropriate units: Reports shared memory size in MB (more suitable scale) vs GB for global memory

Example output

Before:

Table size = 536870912 (4.294967 GB.)
Each thread access 8 locations.

After:

Table size = 12288 (0.098304 MB.) [shared memory: 49152 bytes per block x 2 blocks]
Each thread access 65536 locations.

Testing

  • Code compiles successfully with make clean && make
  • Shared memory tests now display correct memory sizes and access counts
  • Global memory tests remain unchanged and work as before

Addresses the confusion mentioned in #56 about the misleading Table size output for shared memory tests.

Note: This is a clean PR with only the relevant changes, created from the latest master branch.

Based on feedback from rkarim2 in issue NVIDIA-developer-blog#56:
- For shared memory tests, Table size now shows the actual total shared memory used
- Display shared memory allocation details (bytes per block × number of blocks)
- Use correct accesses_per_elem_sh for shared memory tests
- Report size in MB for shared memory vs GB for global memory

This fixes the misleading Table size output that showed irrelevant global memory
sizes (e.g., 4.2 GB) when running shared memory tests.
@rkarim2
Copy link

rkarim2 commented Sep 19, 2025

Remove the comment on line 497
Otherwise, this looks and works fine.
Thanks @dmikushin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants