-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement VulkanRenderer #17
Comments
Currently being done in #13 |
Remaining tasks are being tracked here: |
TODO list, since we're coming up to the first testable version...
|
Some notes for point rendering... in https://github.com/FNA-XNA/MojoShader/blob/upstream/mojoshader_opengl.c#L477 It's not too complicated but it does mean we have to hold off on building the shader module until we know that point sprite rendering is in use. This may end up being a part of the shader linker that solves the problem of linking vertex/pixel shader semantics together, instead of this disaster I came up with: https://github.com/FNA-XNA/MojoShader/blob/upstream/profiles/mojoshader_profile_spirv.c#L1513 |
Some notes for threading support... we know two major limitations to Vulkan threading:
To me, this says "we need a mutex that locks on both a render pass and on vkCmd execution." I'm thinking we have a mutex called "asyncCommandLock" that not only wraps around vkCmd calls, but also blocks for the entire duration of a render pass and for the entire duration that the command buffer is not recording. This sounds really broad, but this isn't as bad as it might seem:
This makes it about as deferred as D3D11, where we have a big lock on the DeviceContext. Where it gets a little weird is the function dependency tree... these are all the functions that record a command, and the functions that use them:
We have to make sure to lock for the duration that a single function submits commands, in addition to locking when a pass is active and a buffer isn't recording. This will prevent commands from getting mixed up when a thread tries to poke a command in during another function's execution (if that matters, it may not?). EDIT: vkCmd limitations:
|
Some more threading notes: Previously we tried decoupling render and data command buffers, but we didn't quite get it working because there's a bit more to it than just having separate buffers to avoid render pass errors. It may be worth trying this again, but with a better idea of how to synchronize the buffers: Render Command Buffer
Data Command Buffer
When to Lock Command Buffers It's not enough to lock just for vkCmd operations - we need to lock for whole blocks of commands, as several commands may need to exist together for the queue to work (for example, render passes need to have both a Begin and End, and SetData needs one lock/unlock pair for both the buffer memory barriers and the vkCmdCopyImageToBuffer in between). Because of this, we may need to refactor the internal functions in such a way that it's clear what is a data command and what is a render command, while also making it clear when we're locking the command buffer to submit commands and when we're locking to submit to the queue. This is particularly bad for image memory barriers and pipeline stalls, where we typically have render and data operations happening all at once. |
Yup, even more threading notes: Vulkan commands called by every driver function:
|
As of the latest commit, the VulkanRenderer is now in Beta. All the critical features are done and function, but there are still some FIXMEs and a handful of places where performance improvements are possible, but at this point the renderer should Just Work with XNA games now. Instead of having a testing period and locking down a release date, we'll try and get it to where it's indisputably faster than OpenGL, then lock down a Release Candidate and final testing period before making it the default renderer. |
At this point we now have Vulkan running pretty fast with all the uninitialized memory issues handled by Valgrind - there's really only one thing left for us to assess: Moving Vulkan command recording and queue submissions to a thread. This is relatively straightforward on paper, as it should just be a matter of moving VULKAN_INTERNAL_RecordCommands, Flush, and FlushAndPresent on to a dedicated thread which runs on a single semaphore. Adding threads is always harder than that, but as far as existing structure is concerned we've already set this up really really well. Once this works, it is expected that most games will be at least as fast when running with Vulkan, vs running with OpenGL. SpriteBatchTest shows the issue pretty clearly when you crank up the sprite count, as it's definitely faster to submit commands, but when it's time to present the frame it chokes pretty hard and it doesn't show up in the frame timing at all since it's all squished into Present() right now. Games like FEZ show very stable framerates but still fall behind GL by a pretty sizeable percentage. Games are easiest to compare by using |
The latest VulkanRenderer now supports proper command buffer generation and submission and asynchronous presentation. As a result, performance is pretty damn good and beats OpenGL in a whole lot of cases now. The downside is that threading support had to get rolled back, but we expect this to be easier to fix than last time. Once we have our stress tests working, that'll be when we start running through the catalog to make sure everything works, and provided nothing goes wrong (ha) we'll mark this as a release candidate and eventually move it to the top of the driver list. Most of the TODOs/FIXMEs are minor, but they may be important for optimization later on. One last thing we need that's not critical for testing but is critical for release is PrepareWindowAttributes - we have it falling back when we don't have a loader, but we also need to fall back when no hardware supports the extensions we need. I'm hoping we can mostly rip off device creation here without creating a device, but we'll see... |
Mostly just leaving this as a note for my future self. When we get to testing games, there are a few cases that will need special attention, beyond the standard "does it look ok and not crash" testing. In particular...
I'm sure there are other special cases, these are just the ones at the forefront of my memory. |
Thankfully 2 won't happen, as it was a spec violation regarding GL_POINT_SPRITE, a state that's not variable in Vulkan and can't have the same issues like radeonsi/iris did. 3 I remember apitracing but I forget what the end result was... a missing clear maybe? It's probably in the Discord logs somewhere. |
The latest commit should have Vulkan threading in a working state. But, as always, you should run this against any threaded work you have just in case, as behavior tends to be highly unpredictable for XNA games. In my own testing this works well (Streets of Rage 4 is running fine with it), though I'm now realizing that we're having to look at memory management again as it uses well over twice the memory that GL does, which is fine if you have a lot of VRAM but not so fine if you don't (SOR4 bails at loading a single level on a 2GB card). This could possibly be influenced by the threads (maybe some contention with the memory allocator?) but we'll need to investigate that either way. |
We just pushed the commits to FNA and FNA3D that resolve all the known issues we have - the remaining FIXMEs/TODOs in the source aren't critical for release. At this point we can finally start doing mass testing of games. I happen to be releasing Streets of Rage 4 really really soon, so I'll be monitoring that... |
Testing is moving along slowly but surely - one major detail though: Has anyone checked MoltenVK on iOS? Even if we just test Rogue Legacy that's fine, I just want to be sure we're not losing anything by fully deprecating Metal when the Vulkan work is done. |
Oh I meant to ask, on Mono 4 games like Sequence and Blueberry Garden I get the following error when I drop in a new FNA.dll and fnalibs: ❯ FNA3D_FORCE_DRIVER=Vulkan ./Sequence.bin.x86_64
Stacktrace:
at SDL2.SDL.SDL_SetHintWithPriority (string,string,SDL2.SDL/SDL_HintPriority) <0x0008b>
at Microsoft.Xna.Framework.SDL2_FNAPlatform.ProgramInit (Microsoft.Xna.Framework.LaunchParameters) <0x00593>
at Microsoft.Xna.Framework.FNAPlatform..cctor () <0x00c23>
at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>
at Microsoft.Xna.Framework.Game..ctor () <0x0020f>
at Sequence.Game1..ctor () <0x015e7>
at Sequence.Program.Main (string[]) <0x0004b>
at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff> Any idea how I can work around this? |
I forget what it is but I think it has to do with the compiler vs runtime? I fixed it locally either by using an older compiler or updating the runtime, and Blueberry needs MONO_FORCE_COMPAT=1. |
We just got the last major issue fixed in the Vulkan renderer - I'll be doing a full run of SOR4 on a 2GB card and if all is well we'll be in release candidate for 21.02. If I don't even run into memory issues I'll also take out the |
We're at the point where this is actively shipping in stuff and works well enough, so at this point I'm going to mark this as "stable, but not default". For making this default, see #72. |
The SPIR-V emitter for MojoShader is already done, so we just need to write the driver. It's possible that the emitter may need some modifications for uniform/attribute linking, but the bulk of the work was already done for ARB_gl_spirv support.
Some useful links:
Alpha:
Beta:
glspirv
spirv
Release Candidate:
Vulkan By Default:
The text was updated successfully, but these errors were encountered: