https://github.com/pytorch/pytorch/pull/164212?fbclid=IwY2xjawOueGdleHRuA2FlbQIxMQBicmlkETFseHhxbFpyNUdCSGJPc01zc3J0YwZhcHBfaWQBMAABHmDGXm1SwIiu3r2BunCquSBx0PqZ_pbUTIYKS8hjg3zms2IKUC4pS8TNTP0C_aem_9cDo1eAi4fCA9o5212Frag Notice triton kernel performance varies from input to input. Making custom op autotuning works on triton kernels