Avoid redundant bindings/states based on Metal profiler feedback. #2006
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses some of the issues reported by the Metal capture/profile tools, specifically redundant vertex bindings, depth/stencil and sampler state descriptors, and blit encoders.
A large number of issues are still reported, mostly around binding the same "bytes" (where the data is placed directly in the encoder as opposed to a Metal buffer reference). This is to be expected, as similar drawables will bind the same values for some arguments to the same indexes.
The implication that these bindings can be elided does not seem to be true, however. Checking for buffer content equality in addition to buffer instance equality does resolve most of the reported issues, but does not render correctly. This appears to be an inconsistency between the debug tool and the actual encoder behavior. It's possible that it was just a bug, however, and it would be possible to elide many more of the
set[Fragment|Vertex]Bytes
calls, at the cost ofmemcmp
calls.While this eliminates some hundreds of
objc_msgSend
calls per frame and improves the times reported by the Time Profiler tool, unfortunately it didn't produce any significant difference in measured encoding or rendering times from the benchmark app.