Skip to content

Conversation

@venhelhardt
Copy link
Contributor

Transparent and transmissive phases previously used the instance translation from GlobalTransform as the sort position. This breaks down when mesh geometry is authored in "world-like" coordinates and the instance transform is identity or near-identity (common in building/CAD-style content). In such cases multiple transparent instances end up with the same translation and produce incorrect draw order.

This change introduces sorting based on the world-space center of the mesh bounds instead of the raw translation. The local bounds center is stored per mesh/instance and transformed by the instance’s world transform when building sort keys. This adds a small amount of per-mesh/instance data but produces much more correct transparent and transmissive rendering in real-world scenes.

Objective

Currently, transparent and transmissive render phases in Bevy sort instances using the translation from GlobalTransform. This works only if the mesh origin is a good proxy for the geometry position. In many real-world cases (especially CAD/architecture-like content), the mesh data is authored in "world-like" coordinates and the instance Transform is identity. In such setups, sorting by translation produces incorrect draw order for transparent/transmissive objects.

I propose switching the sorting key from GlobalTransform.translation to the world-space center of the mesh bounds for each instance.

Solution

Instead of using GlobalTransform.translation as the sort position for transparent/transmissive phases, use the world-space center of the mesh bounds:

  1. Store the local-space bounds center for each render mesh (e.g. in something like RenderMeshInstanceShared as center: Vec3 derived from the mesh Aabb).
  2. For each instance, compute the world-space center by applying the instance transform.
  3. Use this world-space center as the position for distance / depth computation in view space when building sort keys for transparent and transmissive phases.

This way:

  • Sorting respects the actual spatial position of the geometry
  • Instances with baked-in “world-like” coordinates inside the mesh are handled correctly
  • Draw order for transparent objects becomes much more stable and visually correct in real scenes

The main trade-offs:

  • Adding a Vec3 center in RenderMeshInstanceShared (typically +12 or +16 bytes depending on alignment),
  • For each instance, we need to transform the local bounds center into world space to compute the sort key.

Alternative approach and its drawbacks

In theory, this could be fixed by baking meshes so that:

  • The mesh is recentered around its local bounding box center, and
  • The instance Transform is adjusted to move it back into place.

However, this has several drawbacks:

  • Requires modifying vertex data for each mesh (expensive and error-prone)
  • Requires either duplicating meshes or introducing one-off edits, which is bad for instancing and memory
  • Complicates asset workflows (tools, exporters, pipelines)
  • Still does not address dynamic or procedurally generated content

In practice, this is not a scalable or convenient solution.

Secondary issue: unstable ordering when depth is equal

There is another related problem with the current sorting: when two transparent/transmissive instances end up with the same view-space depth (for example, their centers project onto the same depth plane), the resulting draw order becomes unstable. This leads to visible flickering, because the internal order of RenderEntity items is not guaranteed to be
stable between frames.

In practice this happens quite easily, especially when multiple transparent instances share the same or very similar sort depth, and
their relative order in the extracted render list can change frame to frame.

To address this, I suggest extending the sort key with a deterministic tie-breaker, for example the entity's main index. Conceptually, the sort key would become:

  • primary: view-space depth (or distance),
  • secondary: stable per-entity index

This ensures that instances with the same depth keep a consistent draw order across frames, removing flickering while preserving the intended depth-based sorting behavior.

Testing

  • Did you test these changes? If so, how?
cargo run -p ci -- test
cargo run -p ci -- doc
cargo run -p ci -- compile
  • Are there any parts that need more testing? Not sure
  • How can other people (reviewers) test your changes? Is there anything specific they need to know?
    Run this "example"
use bevy::{
    camera_controller::free_camera::{FreeCamera, FreeCameraPlugin},
    prelude::*,
};

fn main() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_plugins(FreeCameraPlugin)
        .add_systems(Startup, setup)
        .add_systems(Update, view_orient)
        .run();
}

fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
) {
    let material = materials.add(StandardMaterial {
        base_color: Color::srgb_u8(150, 250, 150).with_alpha(0.7),
        alpha_mode: AlphaMode::Blend,
        ..default()
    });
    let mesh = Cuboid::new(3., 3., 1.)
        .mesh()
        .build()
        .translated_by(Vec3::new(1.5, 1.5, 0.5));

    // Cuboids grids
    for k in -1..=0 {
        let z_offset = k as f32 * 3.;

        for i in 0..3 {
            let x_offset = i as f32 * 3.25;

            for j in 0..3 {
                let y_offset = j as f32 * 3.25;

                commands.spawn((
                    Mesh3d(
                        meshes.add(
                            mesh.clone()
                                .translated_by(Vec3::new(x_offset, y_offset, z_offset)),
                        ),
                    ),
                    MeshMaterial3d(material.clone()),
                ));
            }
        }
    }

    // Cuboids at the center share the same position and are equidistant from the camera
    {
        commands.spawn((
            Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
            MeshMaterial3d(material.clone()),
        ));
        commands.spawn((
            Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
            MeshMaterial3d(materials.add(StandardMaterial {
                base_color: Color::srgb_u8(150, 150, 250).with_alpha(0.6),
                alpha_mode: AlphaMode::Blend,
                ..default()
            })),
        ));
        commands.spawn((
            Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
            MeshMaterial3d(materials.add(StandardMaterial {
                base_color: Color::srgb_u8(250, 150, 150).with_alpha(0.5),
                alpha_mode: AlphaMode::Blend,
                ..default()
            })),
        ));
    }

    commands.spawn((PointLight::default(), Transform::from_xyz(-3., 10., 4.5)));
    commands.spawn((
        Camera3d::default(),
        Transform::from_xyz(-3., 12., 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y),
        FreeCamera::default(),
    ));
    commands.spawn((
        Node {
            position_type: PositionType::Absolute,
            padding: UiRect::all(px(10)),
            ..default()
        },
        GlobalZIndex(i32::MAX),
        children![(
            Text::default(),
            children![
                (TextSpan::new("1 - 3D view\n")),
                (TextSpan::new("2 - Front view\n")),
                (TextSpan::new("3 - Top view\n")),
                (TextSpan::new("4 - Right view\n")),
            ]
        )],
    ));
}

fn view_orient(
    input: Res<ButtonInput<KeyCode>>,
    mut camera_xform: Single<&mut Transform, With<Camera>>,
) {
    let xform = if input.just_pressed(KeyCode::Digit1) {
        Some(Transform::from_xyz(-3., 12., 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y))
    } else if input.just_pressed(KeyCode::Digit2) {
        Some(Transform::from_xyz(4.75, 4.75, 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y))
    } else if input.just_pressed(KeyCode::Digit3) {
        Some(Transform::from_xyz(4.75, 18., -1.).looking_at(Vec3::new(4.75, 0., -1.), Vec3::NEG_Z))
    } else if input.just_pressed(KeyCode::Digit4) {
        Some(Transform::from_xyz(-15., 4.75, -1.).looking_at(Vec3::new(0., 4.75, -1.), Vec3::Y))
    } else {
        None
    };

    if let Some(xform) = xform {
        camera_xform.set_if_neq(xform);
    }
}
  • If relevant, what platforms did you test these changes on, and are there any important ones you can't test? MacOS

Showcase

In my tests with building models (windows, glass, etc.), switching from translation-based sorting to bounds-center-based sorting noticeably improves the visual result. Transparent surfaces that were previously fighting or blending incorrectly now render in a much more expected order.

Current:

https://youtu.be/WjDjPAoKK6w

Sort by aabb center:

https://youtu.be/-Sl4GOXp_vQ

Sort by aabb center + tie breaker:

https://youtu.be/0aQhkSKxECo

Transparent and transmissive phases previously used the instance
translation from GlobalTransform as the sort position. This breaks
down when mesh geometry is authored in "world-like" coordinates and
the instance transform is identity or near-identity (common in
building/CAD-style content). In such cases multiple transparent
instances end up with the same translation and produce incorrect
draw order.

This change introduces sorting based on the world-space center of the
mesh bounds instead of the raw translation. The local bounds center is
stored per mesh/instance and transformed by the instance’s world
transform when building sort keys. This adds a small amount of
per-mesh/instance data but produces much more correct transparent and
transmissive rendering in real-world scenes.
@IceSentry
Copy link
Contributor

I haven't reviewed the code yet and I'm not opposed to the idea but I would say that for an app that cares a lot about correct transparency the solution should be using some form of order independent transparency. We have support for it in bevy, there's still some work to be done on it but it can definitely be used for CAD apps since it's already being used in production CAD apps.

@IceSentry
Copy link
Contributor

Okay, I looked at the code and everything seems to make sense to me. The only thing I would like to see is some kind of benchmark that shows that it isn't introducing a big performance regression. And if possible it would be nice to have numbers comparing with and without the tie breaker.

@IceSentry IceSentry added A-Rendering Drawing game state to the screen S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 6, 2025
@IceSentry IceSentry added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes labels Dec 6, 2025
@venhelhardt
Copy link
Contributor Author

venhelhardt commented Dec 8, 2025

Okay, I looked at the code and everything seems to make sense to me. The only thing I would like to see is some kind of benchmark that shows that it isn't introducing a big performance regression. And if possible it would be nice to have numbers comparing with and without the tie breaker.

Thanks for looking at the code and for the suggestion!

What part of the change would you like to see benchmarked? From my side there are two main areas:

  1. AABB center computation: getting the AABB is very cheap. Transforming the AABB center per instance is just a few muls/adds (ideally in vector form), so I do not expect it to be a measurible contributor in the pipeline.
  2. Sorting with the tie breaker: right now we sort by a tuple (distance, entity_index) using
    radsort. Because radsort is stable, it effectively sorts twice, so the cost is about 2x compared to sorting by distance only.

Even with that overhead it is still roughly 2-3x faster than std::sort / unstable_sort in my local tests. We can improve this by packing the distance (as a lexicographically sortable u32) and the main entity index into a single u64. Then radsort would sort once instead of twice, so the regression should drop from ~100% to ~50% (overhead comes from doubling the number of bits).

I held off on implementing the packed key because it requires a proper f32-to-lex-u32 conversion utility (similar to FloatOrd) along with tests. If you believe it's worth adding, I am more than willing to implement it.

I am also not sure how large the sorting cost is in the overall blending pipeline. For example, saving ~1 ms on sorting 100k instances might be outweighed significantly by the cost of issuing 100k draw calls.

If this approach makes sense, I can:

  • add sort benchmarks to benches/bevy_render, and/or
  • implement the packed (distance, entity_index) key.

Please let me know which option you would prefer, and I will update the PR accordingly.

@IceSentry
Copy link
Contributor

IceSentry commented Dec 9, 2025

What part of the change would you like to see benchmarked?

A bit of both. I mostly want to make sure this PR isn't a regression. I doubt that using the AABB center would have a high impact but it's still a pretty hot path so I would prefer to have at least some numbers to confirm it. I'd also like to see how much of an impact using the tie breaker makes. I assume it will be fairly small relative to everything else and is worth it for the stability gain but I always prefer having real numbers instead of assuming.

As for how to test, you don't need to add new benches. Just try to run a few complex scenes with a lot of transparent meshes and compare the frametimes using tracy. Like, maybe just spawn a 50x50x50 grid of transparent cubes and see if you see any performance impact.

Oh and don't bother about packing unless you confirm that the impact of sorting with the tie breaker is high enough that it matters. We can always do it later if necessary but I prefer having a baseline that's easier to understand.

@IceSentry
Copy link
Contributor

I should specify, I would even be happy with just a tracy comparison of a scene with a lot of meshes of main vs this PR. Comparing with vs without the tie breaker would be nice but not necessary at all.

@venhelhardt
Copy link
Contributor Author

As for how to test, you don't need to add new benches. Just try to run a few complex scenes with a lot of transparent meshes and compare the frametimes using tracy. Like, maybe just spawn a 50x50x50 grid of transparent cubes and see if you see any performance impact.

Thanks for the guidance! I tried it quickly without the tie breaker and already see about a 10% regression. I suspect this comes from the baseline using a no-op sort, so I'll need to set up the same test using instanced meshes where the distances are non-zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants