Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaCheckError() failed: unspecified launch failure #28

Open
eneserdo opened this issue Jul 7, 2022 · 6 comments
Open

cudaCheckError() failed: unspecified launch failure #28

eneserdo opened this issue Jul 7, 2022 · 6 comments

Comments

@eneserdo
Copy link

eneserdo commented Jul 7, 2022

Hi, when I run ./experiments/scripts/demo.sh, I am getting the following error:

...
object 0, class 019_pitcher_base, z 0.5498372912406921, z new 0.6564772725105286
object 1, class 008_pudding_box, z 0.6858921647071838, z new 0.7096845507621765
object 2, class 002_master_chef_can, z 0.5724276304244995, z new 0.6095401048660278
object 3, class 052_extra_large_clamp, z 0.6460237503051758, z new 0.6376593708992004
object 4, class 011_banana, z 0.6999950408935547, z new 0.7429261207580566
/opt/conda/conda-bld/pytorch_1591914742272/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple)
sdf 27338 points for object 0, class 10 019_pitcher_base
sdf 6888 points for object 1, class 6 008_pudding_box
sdf 16038 points for object 2, class 0 002_master_chef_can
sdf 11314 points for object 3, class 18 052_extra_large_clamp
sdf 3385 points for object 4, class 9 011_banana
sdf with 64963 points
cudaCheckError() failed: unspecified launch failure

I tried on multiple machines. Here is the full error log

My setup:
Ubuntu 20.4
CUDA 10.1
PyTorch 1.4

Any help will be appreciated.

@tsrobcvai
Copy link

Hi, when I run ./experiments/scripts/demo.sh, I am getting the following error:

...
object 0, class 019_pitcher_base, z 0.5498372912406921, z new 0.6564772725105286
object 1, class 008_pudding_box, z 0.6858921647071838, z new 0.7096845507621765
object 2, class 002_master_chef_can, z 0.5724276304244995, z new 0.6095401048660278
object 3, class 052_extra_large_clamp, z 0.6460237503051758, z new 0.6376593708992004
object 4, class 011_banana, z 0.6999950408935547, z new 0.7429261207580566
/opt/conda/conda-bld/pytorch_1591914742272/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple)
sdf 27338 points for object 0, class 10 019_pitcher_base
sdf 6888 points for object 1, class 6 008_pudding_box
sdf 16038 points for object 2, class 0 002_master_chef_can
sdf 11314 points for object 3, class 18 052_extra_large_clamp
sdf 3385 points for object 4, class 9 011_banana
sdf with 64963 points
cudaCheckError() failed: unspecified launch failure

I tried on multiple machines. Here is the full error log

My setup: Ubuntu 20.4 CUDA 10.1 PyTorch 1.4

Any help will be appreciated.

I also met this problem. Ubuntu 20.4, CUDA 11.1, PyTorch 1.8. Have you solved this problem?

@eneserdo
Copy link
Author

Nope

@wetoo-cando
Copy link

Same problem here when I run ./experiments/scripts/dex_ycb_test_s0.sh 0 with
Ubuntu 20.04
Cuda 11.1
torch 1.10.1+cu111.

I am in a python-venv inside a docker container based on https://hub.docker.com/r/nvidia/cudagl.

@eneserdo @mcgilltaosun could you solve this?

@eneserdo
Copy link
Author

eneserdo commented Aug 9, 2023

I dropped my job because of this error. Please do not tag me anymore. Why nvidia, why are you not reproducible

@wetoo-cando
Copy link

A little more print debugging shows the exact location of the error:

object 0, class 025_mug, z 0.7601078152656555, z new 0.8151350021362305
object 1, class 003_cracker_box, z 0.9675762057304382, z new 1.0729904174804688
object 2, class 002_master_chef_can, z 0.7824445962905884, z new 0.7986501455307007
sdf 5599 points for object 0, class 13 025_mug
sdf 10896 points for object 1, class 1 003_cracker_box
sdf 8007 points for object 2, class 0 002_master_chef_can
sdf with 24502 points
sdf_matching_loss_kernel.cu: cudaCheckError() failed (cudaDeviceSynchronize): unspecified launch failure

It happens inside the function sdf_loss_cuda_forward() at line 276 in the sdf_matching_loss_kernel.cu file.

No idea what to look for / how to debug further though. Any help would be appreciated.

@namGGG
Copy link

namGGG commented Jan 10, 2024

I'm stuck in the middle...
GPU RTX 3090
Ubuntu 20.04
CUDA 11.1
Pytorch 1.8.2 LTS

/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:3454: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  warnings.warn(
cudaGraphicsGLRegisterImage failed: 304
cudaGraphicsMapResources failed: 400
cudaGraphicsSubResourceGetMappedArray failed: 400
cudaMemcpy2DFromArray failed: 709
cudaGraphicsUnmapResources failed: 400
cudaGraphicsGLRegisterImage failed: 304
cudaGraphicsMapResources failed: 400
cudaGraphicsSubResourceGetMappedArray failed: 400
cudaMemcpy2DFromArray failed: 709
cudaGraphicsUnmapResources failed: 400
cudaGraphicsGLRegisterImage failed: 304
cudaGraphicsMapResources failed: 400
cudaGraphicsSubResourceGetMappedArray failed: 400
cudaMemcpy2DFromArray failed: 709
cudaGraphicsUnmapResources failed: 400
object 0, class 019_pitcher_base, z 0.5521460175514221, z new -0.2002505660057068
object 1, class 008_pudding_box, z 0.6852722764015198, z new -0.021693646907806396
object 2, class 002_master_chef_can, z 0.5711051225662231, z new -0.11909496784210205
object 3, class 052_extra_large_clamp, z 0.6500653028488159, z new 0.3591681122779846
object 4, class 011_banana, z 0.7000908255577087, z new 0.8758471608161926
sdf 0 points for object 0, class 10 019_pitcher_base, no refinement
sdf 0 points for object 1, class 6 008_pudding_box, no refinement
sdf 0 points for object 2, class 0 002_master_chef_can, no refinement
sdf 0 points for object 3, class 18 052_extra_large_clamp, no refinement
sdf 499 points for object 4, class 9 011_banana
sdf with 499 points
cudaCheckError() failed: unspecified launch failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants