Skip to content

Commit 11817f4

Browse files
authored
Generate MeshUniforms on the GPU via compute shader where available. (#12773)
Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a *mesh input uniform* and adding a *mesh uniform building* compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the *extraction* phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. | Benchmark | This branch | `main` | Speedup | |------------------------|-------------|---------|---------| | `many_cubes -nfc` | 17.259 | 24.529 | 42.12% | | `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% | | `many_foxes` | 3.227 | 3.515 | 8.92% | Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.
1 parent a9943e8 commit 11817f4

File tree

17 files changed

+1899
-295
lines changed

17 files changed

+1899
-295
lines changed

crates/bevy_pbr/src/lib.rs

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ pub mod graph {
7979
/// Label for the screen space ambient occlusion render node.
8080
ScreenSpaceAmbientOcclusion,
8181
DeferredLightingPass,
82+
/// Label for the compute shader instance data building pass.
83+
GpuPreprocess,
8284
}
8385
}
8486

@@ -133,13 +135,19 @@ pub struct PbrPlugin {
133135
pub prepass_enabled: bool,
134136
/// Controls if [`DeferredPbrLightingPlugin`] is added.
135137
pub add_default_deferred_lighting_plugin: bool,
138+
/// Controls if GPU [`MeshUniform`] building is enabled.
139+
///
140+
/// This requires compute shader support and so will be forcibly disabled if
141+
/// the platform doesn't support those.
142+
pub use_gpu_instance_buffer_builder: bool,
136143
}
137144

138145
impl Default for PbrPlugin {
139146
fn default() -> Self {
140147
Self {
141148
prepass_enabled: true,
142149
add_default_deferred_lighting_plugin: true,
150+
use_gpu_instance_buffer_builder: true,
143151
}
144152
}
145153
}
@@ -280,7 +288,9 @@ impl Plugin for PbrPlugin {
280288
.register_type::<DefaultOpaqueRendererMethod>()
281289
.init_resource::<DefaultOpaqueRendererMethod>()
282290
.add_plugins((
283-
MeshRenderPlugin,
291+
MeshRenderPlugin {
292+
use_gpu_instance_buffer_builder: self.use_gpu_instance_buffer_builder,
293+
},
284294
MaterialPlugin::<StandardMaterial> {
285295
prepass_enabled: self.prepass_enabled,
286296
..Default::default()
@@ -292,6 +302,9 @@ impl Plugin for PbrPlugin {
292302
ExtractComponentPlugin::<ShadowFilteringMethod>::default(),
293303
LightmapPlugin,
294304
LightProbePlugin,
305+
GpuMeshPreprocessPlugin {
306+
use_gpu_instance_buffer_builder: self.use_gpu_instance_buffer_builder,
307+
},
295308
))
296309
.configure_sets(
297310
PostUpdate,
@@ -386,15 +399,6 @@ impl Plugin for PbrPlugin {
386399
let draw_3d_graph = graph.get_sub_graph_mut(Core3d).unwrap();
387400
draw_3d_graph.add_node(NodePbr::ShadowPass, shadow_pass_node);
388401
draw_3d_graph.add_node_edge(NodePbr::ShadowPass, Node3d::StartMainPass);
389-
390-
render_app.ignore_ambiguity(
391-
bevy_render::Render,
392-
bevy_core_pipeline::core_3d::prepare_core_3d_transmission_textures,
393-
bevy_render::batching::batch_and_prepare_sorted_render_phase::<
394-
bevy_core_pipeline::core_3d::Transmissive3d,
395-
MeshPipeline,
396-
>,
397-
);
398402
}
399403

400404
fn finish(&self, app: &mut App) {

crates/bevy_pbr/src/lightmap/mod.rs

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ use bevy_render::{
4848
};
4949
use bevy_utils::HashSet;
5050

51-
use crate::RenderMeshInstances;
51+
use crate::{ExtractMeshesSet, RenderMeshInstances};
5252

5353
/// The ID of the lightmap shader.
5454
pub const LIGHTMAP_SHADER_HANDLE: Handle<Shader> =
@@ -132,10 +132,9 @@ impl Plugin for LightmapPlugin {
132132
return;
133133
};
134134

135-
render_app.init_resource::<RenderLightmaps>().add_systems(
136-
ExtractSchedule,
137-
extract_lightmaps.after(crate::extract_meshes),
138-
);
135+
render_app
136+
.init_resource::<RenderLightmaps>()
137+
.add_systems(ExtractSchedule, extract_lightmaps.after(ExtractMeshesSet));
139138
}
140139
}
141140

@@ -159,8 +158,8 @@ fn extract_lightmaps(
159158
if !view_visibility.get()
160159
|| images.get(&lightmap.image).is_none()
161160
|| !render_mesh_instances
162-
.get(&entity)
163-
.and_then(|mesh_instance| meshes.get(mesh_instance.mesh_asset_id))
161+
.mesh_asset_id(entity)
162+
.and_then(|mesh_asset_id| meshes.get(mesh_asset_id))
164163
.is_some_and(|mesh| mesh.layout.0.contains(Mesh::ATTRIBUTE_UV_1.id))
165164
{
166165
continue;

crates/bevy_pbr/src/material.rs

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,8 @@ pub const fn screen_space_specular_transmission_pipeline_key(
508508
}
509509
}
510510

511+
/// For each view, iterates over all the meshes visible from that view and adds
512+
/// them to [`BinnedRenderPhase`]s or [`SortedRenderPhase`]s as appropriate.
511513
#[allow(clippy::too_many_arguments)]
512514
pub fn queue_material_meshes<M: Material>(
513515
opaque_draw_functions: Res<DrawFunctions<Opaque3d>>,
@@ -647,7 +649,8 @@ pub fn queue_material_meshes<M: Material>(
647649
let Some(material_asset_id) = render_material_instances.get(visible_entity) else {
648650
continue;
649651
};
650-
let Some(mesh_instance) = render_mesh_instances.get(visible_entity) else {
652+
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(*visible_entity)
653+
else {
651654
continue;
652655
};
653656
let Some(mesh) = render_meshes.get(mesh_instance.mesh_asset_id) else {
@@ -693,8 +696,7 @@ pub fn queue_material_meshes<M: Material>(
693696
match material.properties.alpha_mode {
694697
AlphaMode::Opaque => {
695698
if material.properties.reads_view_transmission_texture {
696-
let distance = rangefinder
697-
.distance_translation(&mesh_instance.transforms.transform.translation)
699+
let distance = rangefinder.distance_translation(&mesh_instance.translation)
698700
+ material.properties.depth_bias;
699701
transmissive_phase.add(Transmissive3d {
700702
entity: *visible_entity,
@@ -717,8 +719,7 @@ pub fn queue_material_meshes<M: Material>(
717719
}
718720
AlphaMode::Mask(_) => {
719721
if material.properties.reads_view_transmission_texture {
720-
let distance = rangefinder
721-
.distance_translation(&mesh_instance.transforms.transform.translation)
722+
let distance = rangefinder.distance_translation(&mesh_instance.translation)
722723
+ material.properties.depth_bias;
723724
transmissive_phase.add(Transmissive3d {
724725
entity: *visible_entity,
@@ -746,8 +747,7 @@ pub fn queue_material_meshes<M: Material>(
746747
| AlphaMode::Premultiplied
747748
| AlphaMode::Add
748749
| AlphaMode::Multiply => {
749-
let distance = rangefinder
750-
.distance_translation(&mesh_instance.transforms.transform.translation)
750+
let distance = rangefinder.distance_translation(&mesh_instance.translation)
751751
+ material.properties.depth_bias;
752752
transparent_phase.add(Transparent3d {
753753
entity: *visible_entity,

crates/bevy_pbr/src/prepass/mod.rs

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
mod prepass_bindings;
22

3-
use bevy_render::batching::{batch_and_prepare_binned_render_phase, sort_binned_render_phase};
43
use bevy_render::mesh::{GpuMesh, MeshVertexBufferLayoutRef};
54
use bevy_render::render_resource::binding_types::uniform_buffer;
65
pub use prepass_bindings::*;
@@ -145,7 +144,11 @@ where
145144
update_mesh_previous_global_transforms,
146145
update_previous_view_data,
147146
),
148-
);
147+
)
148+
.add_plugins((
149+
BinnedRenderPhasePlugin::<Opaque3dPrepass, MeshPipeline>::default(),
150+
BinnedRenderPhasePlugin::<AlphaMask3dPrepass, MeshPipeline>::default(),
151+
));
149152
}
150153

151154
let Some(render_app) = app.get_sub_app_mut(RenderApp) else {
@@ -157,18 +160,7 @@ where
157160
.add_systems(ExtractSchedule, extract_camera_previous_view_data)
158161
.add_systems(
159162
Render,
160-
(
161-
(
162-
sort_binned_render_phase::<Opaque3dPrepass>,
163-
sort_binned_render_phase::<AlphaMask3dPrepass>
164-
).in_set(RenderSet::PhaseSort),
165-
(
166-
prepare_previous_view_uniforms,
167-
batch_and_prepare_binned_render_phase::<Opaque3dPrepass, MeshPipeline>,
168-
batch_and_prepare_binned_render_phase::<AlphaMask3dPrepass,
169-
MeshPipeline>,
170-
).in_set(RenderSet::PrepareResources),
171-
)
163+
prepare_previous_view_uniforms.in_set(RenderSet::PrepareResources),
172164
);
173165
}
174166

@@ -786,7 +778,8 @@ pub fn queue_prepass_material_meshes<M: Material>(
786778
let Some(material_asset_id) = render_material_instances.get(visible_entity) else {
787779
continue;
788780
};
789-
let Some(mesh_instance) = render_mesh_instances.get(visible_entity) else {
781+
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(*visible_entity)
782+
else {
790783
continue;
791784
};
792785
let Some(material) = render_materials.get(*material_asset_id) else {

0 commit comments

Comments
 (0)