Skip to content

Conversation

shubhamvishu
Copy link
Contributor

This uses perf tool to generate the disassembled code for the run. It generates a perf<X>.data file (where X=1,2,3...) for each corresponding benchmark run/configuration. One could then simply look into the disassembled code like below :

Currently it records the executed instructions from the user space. We can add other events also if we want, but this could be a good start and keep a decent file size

sudo perf record -e instructions:u -g java <...>

To look into the disassembled code (disassembling is super slow) :

  • Directly open the view or file
sudo perf annotate -i perf.data -f
  • Print the file content
PAGER=cat perf annotate -i perf1.data -f --stdio 

@shubhamvishu
Copy link
Contributor Author

I used this to verify the SDOT instruction was getting used correctly in the 7-bit case for #13572.

// 4-bit; No matches

>> PAGER=cat perf annotate -i perf1.data -f --stdio | head -100000 | grep "sdot"         
// 7 bit; found matches  

>> PAGER=cat perf annotate -i perf2.data -f --stdio | head -100000 | grep "sdot"        
    0.00 :   79c:    sdot    v0.4s, v2.16b, v4.16b
    0.00 :   7a8:    sdot    v7.4s, v6.16b, v5.16b
    0.00 :   7b0:    sdot    v22.4s, v3.16b, v16.16b
    0.00 :   7bc:    sdot    v21.4s, v18.16b, v17.16b
    0.28 :   7e4:    sdot    v0.4s, v19.16b, v20.16b
    0.55 :   7ec:    sdot    v7.4s, v23.16b, v30.16b
    0.44 :   7f4:    sdot    v22.4s, v24.16b, v31.16b
    1.81 :   7fc:    sdot    v21.4s, v25.16b, v1.16b
    0.34 :   804:    sdot    v0.4s, v26.16b, v2.16b
    0.58 :   80c:    sdot    v7.4s, v27.16b, v4.16b
    0.61 :   814:    sdot    v22.4s, v28.16b, v5.16b
    1.80 :   81c:    sdot    v21.4s, v29.16b, v3.16b
    0.00 :   8b4:    sdot    v0.4s, v22.16b, v16.16b
    0.00 :   8c0:    sdot    v0.4s, v17.16b, v18.16b
    0.00 :   8cc:    sdot    v0.4s, v19.16b, v23.16b
    0.00 :   8d8:    sdot    v0.4s, v24.16b, v25.16b
    0.00 :   8e4:    sdot    v0.4s, v26.16b, v27.16b
    0.00 :   8f0:    sdot    v0.4s, v28.16b, v29.16b
    0.00 :   914:    sdot    v0.4s, v20.16b, v30.16b
    0.00 :   928:    sdot    v0.4s, v31.16b, v6.16b
    0.00 :   94c:    sdot    v0.4s, v16.16b, v22.16b
    0.00 :   950:    sdot    v0.4s, v21.16b, v17.16b
    0.00 :   954:    sdot    v0.4s, v5.16b, v18.16b
    0.00 :   958:    sdot    v0.4s, v4.16b, v19.16b
    0.00 :   95c:    sdot    v0.4s, v3.16b, v23.16b
    0.00 :   960:    sdot    v0.4s, v2.16b, v24.16b
    0.00 :   a04:    sdot    v25.4s, v28.16b, v29.16b
    0.00 :   a14:    sdot    v25.4s, v20.16b, v30.16b
    0.00 :   a24:    sdot    v25.4s, v31.16b, v21.16b
    0.00 :   a34:    sdot    v25.4s, v5.16b, v4.16b
    0.00 :   a44:    sdot    v25.4s, v3.16b, v2.16b
    0.00 :   a54:    sdot    v25.4s, v6.16b, v22.16b
    0.00 :   a88:    sdot    v25.4s, v16.16b, v17.16b
    0.00 :   aa0:    sdot    v25.4s, v23.16b, v24.16b
    0.00 :   aa4:    sdot    v25.4s, v18.16b, v19.16b
    0.00 :   ab0:    sdot    v25.4s, v0.16b, v7.16b
    0.00 :   abc:    sdot    v25.4s, v1.16b, v26.16b
    0.00 :   ac8:    sdot    v25.4s, v27.16b, v28.16b
    0.00 :   ad4:    sdot    v25.4s, v29.16b, v20.16b
    0.00 :   ae0:    sdot    v25.4s, v30.16b, v31.16b
    0.00 :   bf8:    sdot    v0.4s, v21.16b, v6.16b
    0.00 :   c0c:    sdot    v25.4s, v27.16b, v26.16b
// No quantization

>> PAGER=cat perf annotate -i perf3.data -f --stdio | head -100000 | grep "sdot"

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant