-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Closed
Labels
Description
CPU is faster than GPU with the differentiable MPM case by 13 FPS vs 10 FPS
Kernel profiler looks good. Is this expected?
=========================================================================
[ % total count | min avg max ] Kernel name
-------------------------------------------------------------------------
[ 25.49% 0.022 s 2046x | 0.009 0.011 0.335 ms] p2g_c82_0_kernel_0_range_for
[ 15.54% 0.013 s 2046x | 0.006 0.006 0.017 ms] clear_grid_c76_0_kernel_0_range_for
[ 14.18% 0.012 s 1023x | 0.010 0.012 0.023 ms] p2g_c83_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 10.43% 0.009 s 1023x | 0.008 0.009 0.020 ms] g2p_c87_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 10.38% 0.009 s 2046x | 0.004 0.004 0.015 ms] grid_op_c84_0_kernel_0_range_for
[ 5.79% 0.005 s 1023x | 0.004 0.005 0.014 ms] g2p_c86_0_kernel_0_range_for
[ 5.48% 0.005 s 1023x | 0.004 0.005 0.013 ms] grid_op_c85_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 5.25% 0.004 s 1023x | 0.004 0.004 0.011 ms] compute_actuation_c89_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 4.29% 0.004 s 1023x | 0.003 0.004 0.014 ms] compute_actuation_c88_0_kernel_0_range_for
[ 2.83% 0.002 s 1x | 2.400 2.400 2.400 ms] clear_gradients_c36_3_kernel_0_range_for
[ 0.09% 0.000 s 16x | 0.003 0.005 0.016 ms] snode_reader_36_kernel_0_serial
[ 0.07% 0.000 s 16x | 0.003 0.004 0.005 ms] snode_writer_2_kernel_0_serial
[ 0.07% 0.000 s 16x | 0.003 0.003 0.005 ms] snode_reader_2_kernel_0_serial
[ 0.02% 0.000 s 4x | 0.004 0.004 0.005 ms] snode_reader_37_kernel_0_serial
[ 0.02% 0.000 s 4x | 0.004 0.004 0.004 ms] snode_writer_4_kernel_0_serial
[ 0.02% 0.000 s 4x | 0.003 0.003 0.004 ms] snode_reader_4_kernel_0_serial
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] clear_gradients_c36_0_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] clear_gradients_c36_1_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] clear_gradients_c36_2_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] clear_gradients_c36_4_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] compute_x_avg_c90_0_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] compute_x_avg_c91_0_reverse_grad_reverse_grad_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] snode_writer_33_kernel_0_serial
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] compute_loss_c92_0_kernel_0_serial
[ 0.00% 0.000 s 1x | 0.004 0.004 0.004 ms] snode_reader_31_kernel_0_serial
[ 0.00% 0.000 s 1x | 0.003 0.003 0.003 ms] clear_gradients_c36_6_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.003 0.003 0.003 ms] clear_gradients_c36_5_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.003 0.003 0.003 ms] clear_gradients_c36_7_kernel_0_range_for
[ 0.00% 0.000 s 1x | 0.003 0.003 0.003 ms] clear_loss_c38_0_kernel_0_serial
[ 0.00% 0.000 s 1x | 0.003 0.003 0.003 ms] snode_writer_35_kernel_0_serial
[ 0.00% 0.000 s 1x | 0.003 0.003 0.003 ms] compute_loss_c93_0_reverse_grad_reverse_grad_kernel_0_serial
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done