Implement VulkanRenderer #17

flibitijibibo · 2020-04-10T15:45:02Z

flibitijibibo · 2020-04-10T15:45:28Z

Currently being done in #13

thatcosmonaut · 2020-04-16T06:28:47Z

Remaining tasks are being tracked here:
thatcosmonaut#9

flibitijibibo · 2020-06-25T20:17:41Z

TODO list, since we're coming up to the first testable version...

~~SNIP~~ Moved to the OP!

flibitijibibo · 2020-06-26T15:01:39Z

Some notes for point rendering... in glspirv we have this patching setup when linking the shader program:

https://github.com/FNA-XNA/MojoShader/blob/upstream/mojoshader_opengl.c#L477

It's not too complicated but it does mean we have to hold off on building the shader module until we know that point sprite rendering is in use. This may end up being a part of the shader linker that solves the problem of linking vertex/pixel shader semantics together, instead of this disaster I came up with:

https://github.com/FNA-XNA/MojoShader/blob/upstream/profiles/mojoshader_profile_spirv.c#L1513

flibitijibibo · 2020-07-03T15:18:12Z

Some notes for threading support... we know two major limitations to Vulkan threading:

Command buffers/pools require a mutex when accessed from multiple threads
Certain commands cannot be executed during a render pass

To me, this says "we need a mutex that locks on both a render pass and on vkCmd execution." I'm thinking we have a mutex called "asyncCommandLock" that not only wraps around vkCmd calls, but also blocks for the entire duration of a render pass and for the entire duration that the command buffer is not recording. This sounds really broad, but this isn't as bad as it might seem:

For single-threaded games it means nothing, as the mutex will never be fought for
For threaded loading it means it will only ever be rendering or loading, never both at once
For threaded drawing, where loading is done on the main thread, point 2 is still true

This makes it about as deferred as D3D11, where we have a big lock on the DeviceContext.

Where it gets a little weird is the function dependency tree... these are all the functions that record a command, and the functions that use them:

INTERNAL_BufferMemoryBarrier
	INTERNAL_GetTextureData
	SetTextureData*
INTERNAL_ImageMemoryBarrier
	INTERNAL_GetTextureData
	INTERNAL_CreateFauxBackbuffer
	INTERNAL_BeginRenderPass
	INTERNAL_OutsideRenderPassClear
	SwapBuffers
	VerifySampler
	ResolveTarget
	SetTextureData*
	GenColorRenderbuffer
	GenDepthStencilRenderbuffer
	CreateDevice
INTERNAL_GetTextureData
	GetTextureData2D
		ReadBackbuffer
	GetTextureDataCube
INTERNAL_SetViewportCommand
	INTERNAL_BeginRenderPass
	SetViewport
INTERNAL_SetScissorRectCommand
	INTERNAL_BeginRenderPass
	SetScissorRect
	ApplyRasterizerState
INTERNAL_SetStencilReferenceValueCommand
	INTERNAL_BeginRenderPass
	SetReferenceStencil
INTERNAL_SetDepthBiasCommand
	ApplyRasterizerState
INTERNAL_BindPipeline
	Draw*
INTERNAL_BeginRenderPass
	INTERNAL_UpdateRenderPass
		Clear
		Draw*
INTERNAL_EndPass
	INTERNAL_UpdateRenderPass
		Clear
		Draw*
INTERNAL_RenderPassClear
	Clear
INTERNAL_OutsideRenderPassClear
	Clear
INTERNAL_BindResources
	Draw*
SwapBuffers
Draw*
SetBlendFactor
	SetBlendState
ApplyVertexBufferBindings
ResolveTarget
SetTextureData*
QueryBegin
QueryEnd
SetStringMarker

We have to make sure to lock for the duration that a single function submits commands, in addition to locking when a pass is active and a buffer isn't recording. This will prevent commands from getting mixed up when a thread tries to poke a command in during another function's execution (if that matters, it may not?).

EDIT: vkCmd limitations:

Clear/Draw, INSIDE ONLY:
vkCmdBindDescriptorSets
vkCmdBindIndexBuffer
vkCmdBindPipeline
vkCmdBindVertexBuffers
vkCmdClearAttachments
vkCmdClearColorImage
vkCmdClearDepthStencilImage
vkCmdDraw
vkCmdDrawIndexed
vkCmdSetBlendConstants
vkCmdSetDepthBias

ResolveTarget, SwapBuffers:
vkCmdBlitImage

SetTextureData, OUTSIDE ONLY:
vkCmdCopyBufferToImage

GetTextureData, OUTSIDE ONLY:
vkCmdCopyImageToBuffer

Wherever, really:
vkCmdPipelineBarrier
vkCmdSetScissor
vkCmdSetViewport
vkCmdSetStencilReference
vkCmdInsertDebugUtilsLabelEXT

BeginQuery, restarts pass:
vkCmdResetQueryPool
vkCmdBeginQuery

EndQuery, wherever:
vkCmdEndQuery

LOCKS THE PASS:
vkCmdBeginRenderPass

UNLOCKS THE PASS:
vkCmdEndRenderPass

flibitijibibo · 2020-07-08T17:25:25Z

Some more threading notes:

Previously we tried decoupling render and data command buffers, but we didn't quite get it working because there's a bit more to it than just having separate buffers to avoid render pass errors. It may be worth trying this again, but with a better idea of how to synchronize the buffers:

Render Command Buffer

Locks for the following commands:
- vkCmdBindDescriptorSets
- vkCmdBindPipeline
- vkCmdClearAttachments
- vkCmdClearColorImage
- vkCmdClearDepthStencilImage
- vkCmdDraw
- vkCmdDrawIndexed
- vkCmdSetBlendConstants
- vkCmdSetDepthBias
- vkCmdBlitImage
- vkCmdSetScissor
- vkCmdSetViewport
- vkCmdSetStencilReference
- vkCmdInsertDebugUtilsLabelEXT
- vkCmdResetQueryPool
- vkCmdBeginQuery
- vkCmdEndQuery
- vkCmdBeginRenderPass
- vkCmdEndRenderPass
Locks and submits the data command buffer for the following commands:
- vkCmdBindIndexBuffer
- vkCmdBindVertexBuffers
- VULKAN_INTERNAL_ImageMemoryBarrier

Data Command Buffer

Locks for the following commands:
- vkCmdCopyBufferToImage
- VULKAN_INTERNAL_AllocateSubBuffer
- VULKAN_INTERNAL_BufferMemoryBarrier
Locks and submits the render command buffer for the following commands:
- vkCmdCopyImageToBuffer
- VULKAN_INTERNAL_Stall

When to Lock Command Buffers

It's not enough to lock just for vkCmd operations - we need to lock for whole blocks of commands, as several commands may need to exist together for the queue to work (for example, render passes need to have both a Begin and End, and SetData needs one lock/unlock pair for both the buffer memory barriers and the vkCmdCopyImageToBuffer in between). Because of this, we may need to refactor the internal functions in such a way that it's clear what is a data command and what is a render command, while also making it clear when we're locking the command buffer to submit commands and when we're locking to submit to the queue. This is particularly bad for image memory barriers and pipeline stalls, where we typically have render and data operations happening all at once.

flibitijibibo · 2020-07-09T18:27:27Z

Yup, even more threading notes:

Vulkan commands called by every driver function:

VULKAN_SwapBuffers
- vkCmdPipelineBarrier
- vkCmdEndRenderPass
- vkCmdBlitImage
VULKAN_Draw*
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
- vkCmdBeginRenderPass
- vkCmdSetViewport
- vkCmdSetScissor
- vkCmdSetStencilReferenceValue
- vkCmdSetBlendConstants
- vkCmdSetDepthBias
- vkCmdBindPipeline
- vkCmdBindDescriptorSets
- vkCmdBindIndexBuffer
- vkCmdDrawIndexed
- vkCmdDraw
VULKAN_SetViewport
- vkCmdSetViewport
VULKAN_SetScissorRect
- vkCmdSetScissor
VULKAN_SetBlendFactor
- vkCmdSetBlendConstants
VULKAN_SetReferenceStencil
- vkCmdSetStencilReference
VULKAN_SetBlendState
- vkCmdSetBlendConstants
VULKAN_SetDepthStencilState
- vkCmdSetStencilReference
VULKAN_ApplyRasterizerState
- vkCmdSetScissor
- vkCmdSetDepthBias
VULKAN_VerifySampler
- vkCmdPipelineBarrier
- vkCmdEndRenderPass
VULKAN_VerifyVertexSampler
- vkCmdPipelineBarrier
- vkCmdEndRenderPass
VULKAN_ApplyVertexBufferBindings
- vkCmdBindVertexBuffers
VULKAN_ResolveTarget
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
- vkCmdBlitImage
VULKAN_ResetBackbuffer
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
VULKAN_ReadBackbuffer
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
- vkCmdCopyImageToBuffer
VULKAN_SetTextureData
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
- vkCmdCopyBufferToImage
VULKAN_GetTextureData
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
- vkCmdCopyImageToBuffer
VULKAN_GenColorRenderbuffer
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
VULKAN_GenDepthStencilRenderbuffer
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
VULKAN_SetVertex/IndexBufferData
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
VULKAN_GetVertex/IndexBufferData
- vkCmdEndRenderPass
- vkCmdPipelineBarrier
VULKAN_QueryBegin
- vkCmdEndRenderPass
- vkCmdResetQueryPool
- vkCmdBeginQuery
VULKAN_QueryEnd
- vkCmdEndQuery
VULKAN_SetStringMarker
- vkCmdInsertDebugUtilsLabelEXT

flibitijibibo · 2020-08-07T16:41:17Z

As of the latest commit, the VulkanRenderer is now in Beta. All the critical features are done and function, but there are still some FIXMEs and a handful of places where performance improvements are possible, but at this point the renderer should Just Work with XNA games now. Instead of having a testing period and locking down a release date, we'll try and get it to where it's indisputably faster than OpenGL, then lock down a Release Candidate and final testing period before making it the default renderer.

flibitijibibo · 2020-08-29T20:37:11Z

At this point we now have Vulkan running pretty fast with all the uninitialized memory issues handled by Valgrind - there's really only one thing left for us to assess: Moving Vulkan command recording and queue submissions to a thread. This is relatively straightforward on paper, as it should just be a matter of moving VULKAN_INTERNAL_RecordCommands, Flush, and FlushAndPresent on to a dedicated thread which runs on a single semaphore. Adding threads is always harder than that, but as far as existing structure is concerned we've already set this up really really well. Once this works, it is expected that most games will be at least as fast when running with Vulkan, vs running with OpenGL.

SpriteBatchTest shows the issue pretty clearly when you crank up the sprite count, as it's definitely faster to submit commands, but when it's time to present the frame it chokes pretty hard and it doesn't show up in the frame timing at all since it's all squished into Present() right now. Games like FEZ show very stable framerates but still fall behind GL by a pretty sizeable percentage. Games are easiest to compare by using GALLIUM_HUD=fps and VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay.

flibitijibibo · 2020-10-01T03:04:58Z

The latest VulkanRenderer now supports proper command buffer generation and submission and asynchronous presentation. As a result, performance is pretty damn good and beats OpenGL in a whole lot of cases now.

The downside is that threading support had to get rolled back, but we expect this to be easier to fix than last time. Once we have our stress tests working, that'll be when we start running through the catalog to make sure everything works, and provided nothing goes wrong (ha) we'll mark this as a release candidate and eventually move it to the top of the driver list. Most of the TODOs/FIXMEs are minor, but they may be important for optimization later on.

One last thing we need that's not critical for testing but is critical for release is PrepareWindowAttributes - we have it falling back when we don't have a loader, but we also need to fall back when no hardware supports the extensions we need. I'm hoping we can mostly rip off device creation here without creating a device, but we'll see...

TheSpydog · 2020-10-16T17:32:54Z

Mostly just leaving this as a note for my future self. When we get to testing games, there are a few cases that will need special attention, beyond the standard "does it look ok and not crash" testing. In particular...

Little Racers Street: We'll need to do a soak test in the garage. That scene of the game leaks memory like crazy, so it's a good stress test for both MojoShader uniform buffers and FNA3D-side buffer allocation.
FEZ: We need to check the ending cutscene to make sure lines are being appropriately rendered, since we've run into some issues with that in the past.
FEZ: On MoltenVK there's a weird blip immediately after the initial loading screen, where the screen goes black except for a small sprite in the middle. I'm not sure if this is an MVK bug or we're accidentally tripping some undefined behavior that's just happened to work correctly on the other devices we've tested.

I'm sure there are other special cases, these are just the ones at the forefront of my memory.

flibitijibibo · 2020-10-17T03:41:38Z

Thankfully 2 won't happen, as it was a spec violation regarding GL_POINT_SPRITE, a state that's not variable in Vulkan and can't have the same issues like radeonsi/iris did.

3 I remember apitracing but I forget what the end result was... a missing clear maybe? It's probably in the Discord logs somewhere.

flibitijibibo · 2020-10-22T16:59:55Z

The latest commit should have Vulkan threading in a working state. But, as always, you should run this against any threaded work you have just in case, as behavior tends to be highly unpredictable for XNA games.

In my own testing this works well (Streets of Rage 4 is running fine with it), though I'm now realizing that we're having to look at memory management again as it uses well over twice the memory that GL does, which is fine if you have a lot of VRAM but not so fine if you don't (SOR4 bails at loading a single level on a 2GB card). This could possibly be influenced by the threads (maybe some contention with the memory allocator?) but we'll need to investigate that either way.

flibitijibibo · 2020-11-11T21:47:20Z

We just pushed the commits to FNA and FNA3D that resolve all the known issues we have - the remaining FIXMEs/TODOs in the source aren't critical for release.

At this point we can finally start doing mass testing of games. I happen to be releasing Streets of Rage 4 really really soon, so I'll be monitoring that...

flibitijibibo · 2020-12-15T20:25:43Z

Testing is moving along slowly but surely - one major detail though: Has anyone checked MoltenVK on iOS? Even if we just test Rogue Legacy that's fine, I just want to be sure we're not losing anything by fully deprecating Metal when the Vulkan work is done.

thatcosmonaut · 2020-12-17T05:09:55Z

Oh I meant to ask, on Mono 4 games like Sequence and Blueberry Garden I get the following error when I drop in a new FNA.dll and fnalibs:

❯ FNA3D_FORCE_DRIVER=Vulkan ./Sequence.bin.x86_64 
Stacktrace:

  at SDL2.SDL.SDL_SetHintWithPriority (string,string,SDL2.SDL/SDL_HintPriority) <0x0008b>
  at Microsoft.Xna.Framework.SDL2_FNAPlatform.ProgramInit (Microsoft.Xna.Framework.LaunchParameters) <0x00593>
  at Microsoft.Xna.Framework.FNAPlatform..cctor () <0x00c23>
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0xffffffff>
  at Microsoft.Xna.Framework.Game..ctor () <0x0020f>
  at Sequence.Game1..ctor () <0x015e7>
  at Sequence.Program.Main (string[]) <0x0004b>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) <0xffffffff>

Any idea how I can work around this?

flibitijibibo · 2020-12-17T05:18:51Z

I forget what it is but I think it has to do with the compiler vs runtime? I fixed it locally either by using an older compiler or updating the runtime, and Blueberry needs MONO_FORCE_COMPAT=1.

flibitijibibo · 2021-01-29T03:03:07Z

We just got the last major issue fixed in the Vulkan renderer - I'll be doing a full run of SOR4 on a 2GB card and if all is well we'll be in release candidate for 21.02. If I don't even run into memory issues I'll also take out the VULKAN_MEMORY_REQUIREMENT variable.

flibitijibibo · 2021-09-06T23:19:29Z

We're at the point where this is actively shipping in stuff and works well enough, so at this point I'm going to mark this as "stable, but not default". For making this default, see #72.

flibitijibibo added the Vulkan label Apr 10, 2020

flibitijibibo changed the title ~~Implement VulkanDevice~~ Implement VulkanRenderer Apr 10, 2020

flibitijibibo assigned thatcosmonaut Apr 16, 2020

flibitijibibo assigned flibitijibibo and TheSpydog Jul 1, 2020

flibitijibibo closed this as completed Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement VulkanRenderer #17

Implement VulkanRenderer #17

flibitijibibo commented Apr 10, 2020 •

edited

Loading

flibitijibibo commented Apr 10, 2020

thatcosmonaut commented Apr 16, 2020

flibitijibibo commented Jun 25, 2020 •

edited

Loading

flibitijibibo commented Jun 26, 2020

flibitijibibo commented Jul 3, 2020 •

edited

Loading

flibitijibibo commented Jul 8, 2020 •

edited

Loading

flibitijibibo commented Jul 9, 2020 •

edited

Loading

flibitijibibo commented Aug 7, 2020

flibitijibibo commented Aug 29, 2020

flibitijibibo commented Oct 1, 2020

TheSpydog commented Oct 16, 2020

flibitijibibo commented Oct 17, 2020

flibitijibibo commented Oct 22, 2020 •

edited

Loading

flibitijibibo commented Nov 11, 2020

flibitijibibo commented Dec 15, 2020

thatcosmonaut commented Dec 17, 2020 •

edited

Loading

flibitijibibo commented Dec 17, 2020

flibitijibibo commented Jan 29, 2021

flibitijibibo commented Sep 6, 2021

Implement VulkanRenderer #17

Implement VulkanRenderer #17

Comments

flibitijibibo commented Apr 10, 2020 • edited Loading

flibitijibibo commented Apr 10, 2020

thatcosmonaut commented Apr 16, 2020

flibitijibibo commented Jun 25, 2020 • edited Loading

flibitijibibo commented Jun 26, 2020

flibitijibibo commented Jul 3, 2020 • edited Loading

flibitijibibo commented Jul 8, 2020 • edited Loading

flibitijibibo commented Jul 9, 2020 • edited Loading

flibitijibibo commented Aug 7, 2020

flibitijibibo commented Aug 29, 2020

flibitijibibo commented Oct 1, 2020

TheSpydog commented Oct 16, 2020

flibitijibibo commented Oct 17, 2020

flibitijibibo commented Oct 22, 2020 • edited Loading

flibitijibibo commented Nov 11, 2020

flibitijibibo commented Dec 15, 2020

thatcosmonaut commented Dec 17, 2020 • edited Loading

flibitijibibo commented Dec 17, 2020

flibitijibibo commented Jan 29, 2021

flibitijibibo commented Sep 6, 2021

flibitijibibo commented Apr 10, 2020 •

edited

Loading

flibitijibibo commented Jun 25, 2020 •

edited

Loading

flibitijibibo commented Jul 3, 2020 •

edited

Loading

flibitijibibo commented Jul 8, 2020 •

edited

Loading

flibitijibibo commented Jul 9, 2020 •

edited

Loading

flibitijibibo commented Oct 22, 2020 •

edited

Loading

thatcosmonaut commented Dec 17, 2020 •

edited

Loading