Skip to content

[perf] Performance issue on the DiffTaichi MPM example #5828

@turbo0628

Description

@turbo0628

See taichi-dev/difftaichi#57

CPU is faster than GPU with the differentiable MPM case by 13 FPS vs 10 FPS

Kernel profiler looks good. Is this expected?

=========================================================================
[      %     total   count |      min       avg       max   ] Kernel name
-------------------------------------------------------------------------
[ 25.49%   0.022 s   2046x |    0.009     0.011     0.335 ms] p2g_c82_0_kernel_0_range_for
[ 15.54%   0.013 s   2046x |    0.006     0.006     0.017 ms] clear_grid_c76_0_kernel_0_range_for
[ 14.18%   0.012 s   1023x |    0.010     0.012     0.023 ms] p2g_c83_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 10.43%   0.009 s   1023x |    0.008     0.009     0.020 ms] g2p_c87_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 10.38%   0.009 s   2046x |    0.004     0.004     0.015 ms] grid_op_c84_0_kernel_0_range_for
[  5.79%   0.005 s   1023x |    0.004     0.005     0.014 ms] g2p_c86_0_kernel_0_range_for
[  5.48%   0.005 s   1023x |    0.004     0.005     0.013 ms] grid_op_c85_0_reverse_grad_reverse_grad_kernel_0_range_for
[  5.25%   0.004 s   1023x |    0.004     0.004     0.011 ms] compute_actuation_c89_0_reverse_grad_reverse_grad_kernel_0_range_for
[  4.29%   0.004 s   1023x |    0.003     0.004     0.014 ms] compute_actuation_c88_0_kernel_0_range_for
[  2.83%   0.002 s      1x |    2.400     2.400     2.400 ms] clear_gradients_c36_3_kernel_0_range_for
[  0.09%   0.000 s     16x |    0.003     0.005     0.016 ms] snode_reader_36_kernel_0_serial
[  0.07%   0.000 s     16x |    0.003     0.004     0.005 ms] snode_writer_2_kernel_0_serial
[  0.07%   0.000 s     16x |    0.003     0.003     0.005 ms] snode_reader_2_kernel_0_serial
[  0.02%   0.000 s      4x |    0.004     0.004     0.005 ms] snode_reader_37_kernel_0_serial
[  0.02%   0.000 s      4x |    0.004     0.004     0.004 ms] snode_writer_4_kernel_0_serial
[  0.02%   0.000 s      4x |    0.003     0.003     0.004 ms] snode_reader_4_kernel_0_serial
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] clear_gradients_c36_0_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] clear_gradients_c36_1_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] clear_gradients_c36_2_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] clear_gradients_c36_4_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] compute_x_avg_c90_0_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] compute_x_avg_c91_0_reverse_grad_reverse_grad_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] snode_writer_33_kernel_0_serial
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] compute_loss_c92_0_kernel_0_serial
[  0.00%   0.000 s      1x |    0.004     0.004     0.004 ms] snode_reader_31_kernel_0_serial
[  0.00%   0.000 s      1x |    0.003     0.003     0.003 ms] clear_gradients_c36_6_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.003     0.003     0.003 ms] clear_gradients_c36_5_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.003     0.003     0.003 ms] clear_gradients_c36_7_kernel_0_range_for
[  0.00%   0.000 s      1x |    0.003     0.003     0.003 ms] clear_loss_c38_0_kernel_0_serial
[  0.00%   0.000 s      1x |    0.003     0.003     0.003 ms] snode_writer_35_kernel_0_serial
[  0.00%   0.000 s      1x |    0.003     0.003     0.003 ms] compute_loss_c93_0_reverse_grad_reverse_grad_kernel_0_serial

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions