Skip to content

Existing Examples that need to be "Reactant-ified" #1141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 of 6 tasks
avik-pal opened this issue Dec 20, 2024 · 4 comments
Open
4 of 6 tasks

Existing Examples that need to be "Reactant-ified" #1141

avik-pal opened this issue Dec 20, 2024 · 4 comments
Labels

Comments

@avik-pal
Copy link
Member

avik-pal commented Dec 20, 2024

@IvanBioli
Copy link

I tried this very morning the PINN2DPDE example, and it apparently does not work (I waited ~10mins, then closed the julia session). The old tutorial, using Zygote, works on CPU but doesn't on GPU. Any guess why, and on how to self-implement a PINN example in Lux.jl with GPU acceleration?

@avik-pal
Copy link
Member Author

The reactant one is run with every commit so that one should definitely work https://buildkite.com/julialang/lux-dot-jl/builds/5660/canvas?jid=01958752-c988-4112-861c-781adff1fedd#01958752-c988-4112-861c-781adff1fedd/123-377 (it takes around 15 mins to run the full tutorial).

What is the error on Zygote end?

@IvanBioli
Copy link

I tried running the reactant tutorial in a script instead that in a Jupyter notebook and it seems to work (although it does not print anything to screen). However, I got the following warnings. Are these expected?

2025-03-15 16:07:13.988444: I external/xla/xla/service/service.cc:152] XLA service 0x35766ed0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-03-15 16:07:13.988462: I external/xla/xla/service/service.cc:160]   StreamExecutor device (0): NVIDIA GeForce RTX 4090, Compute Capability 8.9
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1742051233.988669 1485984 se_gpu_pjrt_client.cc:951] Using BFC allocator.
I0000 00:00:1742051233.988699 1485984 gpu_helpers.cc:136] XLA backend allocating 19039764480 bytes on device 0 for BFCAllocator.
I0000 00:00:1742051233.988718 1485984 gpu_helpers.cc:177] XLA backend will use up to 6346588160 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1742051233.990755 1485984 cuda_dnn.cc:529] Loaded cuDNN version 90400
E0000 00:00:1742051315.928536 1485984 buffer_comparator.cc:156] Difference at 16: -nan, expected 11.6059
E0000 00:00:1742051315.928564 1485984 buffer_comparator.cc:156] Difference at 17: -nan, expected 14.502
E0000 00:00:1742051315.928566 1485984 buffer_comparator.cc:156] Difference at 18: -nan, expected 11.2449
E0000 00:00:1742051315.928567 1485984 buffer_comparator.cc:156] Difference at 19: -nan, expected 10.0998
E0000 00:00:1742051315.928568 1485984 buffer_comparator.cc:156] Difference at 20: -nan, expected 14.0222
E0000 00:00:1742051315.928570 1485984 buffer_comparator.cc:156] Difference at 21: -nan, expected 10.1321
E0000 00:00:1742051315.928571 1485984 buffer_comparator.cc:156] Difference at 22: -nan, expected 10.2986
E0000 00:00:1742051315.928572 1485984 buffer_comparator.cc:156] Difference at 23: -nan, expected 14.1109
E0000 00:00:1742051315.928573 1485984 buffer_comparator.cc:156] Difference at 24: -nan, expected 13.3463
E0000 00:00:1742051315.928575 1485984 buffer_comparator.cc:156] Difference at 25: -nan, expected 12.8369
2025-03-15 16:08:35.928584: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.
E0000 00:00:1742051315.928833 1485984 buffer_comparator.cc:156] Difference at 16: -nan, expected 11.6059
E0000 00:00:1742051315.928836 1485984 buffer_comparator.cc:156] Difference at 17: -nan, expected 14.502
E0000 00:00:1742051315.928837 1485984 buffer_comparator.cc:156] Difference at 18: -nan, expected 11.2449
E0000 00:00:1742051315.928838 1485984 buffer_comparator.cc:156] Difference at 19: -nan, expected 10.0998
E0000 00:00:1742051315.928839 1485984 buffer_comparator.cc:156] Difference at 20: -nan, expected 14.0222
E0000 00:00:1742051315.928841 1485984 buffer_comparator.cc:156] Difference at 21: -nan, expected 10.1321
E0000 00:00:1742051315.928842 1485984 buffer_comparator.cc:156] Difference at 22: -nan, expected 10.2986
E0000 00:00:1742051315.928843 1485984 buffer_comparator.cc:156] Difference at 23: -nan, expected 14.1109
E0000 00:00:1742051315.928844 1485984 buffer_comparator.cc:156] Difference at 24: -nan, expected 13.3463
E0000 00:00:1742051315.928845 1485984 buffer_comparator.cc:156] Difference at 25: -nan, expected 12.8369
2025-03-15 16:08:35.928847: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.
E0000 00:00:1742051315.929096 1485984 buffer_comparator.cc:156] Difference at 16: -nan, expected 11.6059
E0000 00:00:1742051315.929099 1485984 buffer_comparator.cc:156] Difference at 17: -nan, expected 14.502
E0000 00:00:1742051315.929100 1485984 buffer_comparator.cc:156] Difference at 18: -nan, expected 11.2449
E0000 00:00:1742051315.929102 1485984 buffer_comparator.cc:156] Difference at 19: -nan, expected 10.0998
E0000 00:00:1742051315.929103 1485984 buffer_comparator.cc:156] Difference at 20: -nan, expected 14.0222
E0000 00:00:1742051315.929104 1485984 buffer_comparator.cc:156] Difference at 21: -nan, expected 10.1321
E0000 00:00:1742051315.929105 1485984 buffer_comparator.cc:156] Difference at 22: -nan, expected 10.2986
E0000 00:00:1742051315.929106 1485984 buffer_comparator.cc:156] Difference at 23: -nan, expected 14.1109
E0000 00:00:1742051315.929107 1485984 buffer_comparator.cc:156] Difference at 24: -nan, expected 13.3463
E0000 00:00:1742051315.929109 1485984 buffer_comparator.cc:156] Difference at 25: -nan, expected 12.8369
2025-03-15 16:08:35.929110: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.
E0000 00:00:1742051315.929353 1485984 buffer_comparator.cc:156] Difference at 32: -nan, expected 12.4
E0000 00:00:1742051315.929355 1485984 buffer_comparator.cc:156] Difference at 33: -nan, expected 12.9454
E0000 00:00:1742051315.929357 1485984 buffer_comparator.cc:156] Difference at 34: -nan, expected 12.9462
E0000 00:00:1742051315.929358 1485984 buffer_comparator.cc:156] Difference at 35: -nan, expected 13.9775
E0000 00:00:1742051315.929359 1485984 buffer_comparator.cc:156] Difference at 36: -nan, expected 15.0433
E0000 00:00:1742051315.929360 1485984 buffer_comparator.cc:156] Difference at 37: -nan, expected 12.0589
E0000 00:00:1742051315.929361 1485984 buffer_comparator.cc:156] Difference at 38: -nan, expected 14.4629
E0000 00:00:1742051315.929362 1485984 buffer_comparator.cc:156] Difference at 39: -nan, expected 12.7671
E0000 00:00:1742051315.929364 1485984 buffer_comparator.cc:156] Difference at 40: -nan, expected 12.3584
E0000 00:00:1742051315.929365 1485984 buffer_comparator.cc:156] Difference at 41: -nan, expected 11.6002
2025-03-15 16:08:35.929366: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.
E0000 00:00:1742051315.929611 1485984 buffer_comparator.cc:156] Difference at 32: -nan, expected 12.4
E0000 00:00:1742051315.929614 1485984 buffer_comparator.cc:156] Difference at 33: -nan, expected 12.9454
E0000 00:00:1742051315.929615 1485984 buffer_comparator.cc:156] Difference at 34: -nan, expected 12.9462
E0000 00:00:1742051315.929616 1485984 buffer_comparator.cc:156] Difference at 35: -nan, expected 13.9775
E0000 00:00:1742051315.929617 1485984 buffer_comparator.cc:156] Difference at 36: -nan, expected 15.0433
E0000 00:00:1742051315.929619 1485984 buffer_comparator.cc:156] Difference at 37: -nan, expected 12.0589
E0000 00:00:1742051315.929620 1485984 buffer_comparator.cc:156] Difference at 38: -nan, expected 14.4629
E0000 00:00:1742051315.929621 1485984 buffer_comparator.cc:156] Difference at 39: -nan, expected 12.7671
E0000 00:00:1742051315.929622 1485984 buffer_comparator.cc:156] Difference at 40: -nan, expected 12.3584
E0000 00:00:1742051315.929623 1485984 buffer_comparator.cc:156] Difference at 41: -nan, expected 11.6002
2025-03-15 16:08:35.929625: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.
E0000 00:00:1742051315.929866 1485984 buffer_comparator.cc:156] Difference at 32: -nan, expected 12.4
E0000 00:00:1742051315.929868 1485984 buffer_comparator.cc:156] Difference at 33: -nan, expected 12.9454
E0000 00:00:1742051315.929870 1485984 buffer_comparator.cc:156] Difference at 34: -nan, expected 12.9462
E0000 00:00:1742051315.929871 1485984 buffer_comparator.cc:156] Difference at 35: -nan, expected 13.9775
E0000 00:00:1742051315.929872 1485984 buffer_comparator.cc:156] Difference at 36: -nan, expected 15.0433
E0000 00:00:1742051315.929873 1485984 buffer_comparator.cc:156] Difference at 37: -nan, expected 12.0589
E0000 00:00:1742051315.929874 1485984 buffer_comparator.cc:156] Difference at 38: -nan, expected 14.4629
E0000 00:00:1742051315.929875 1485984 buffer_comparator.cc:156] Difference at 39: -nan, expected 12.7671
E0000 00:00:1742051315.929877 1485984 buffer_comparator.cc:156] Difference at 40: -nan, expected 12.3584
E0000 00:00:1742051315.929878 1485984 buffer_comparator.cc:156] Difference at 41: -nan, expected 11.6002
2025-03-15 16:08:35.929879: E external/xla/xla/service/gpu/autotuning/gemm_fusion_autotuner.cc:1138] Results do not match the reference. This is likely a bug/unexpected loss of precision.

@avik-pal
Copy link
Member Author

For the most part it should be harmless. Though it is an XLA bug (is also produced by Jax in certain cases). The main problem is reducing this to a small enough case which can be reproduced consistently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants