Commit 319dd4c
committed
fix: prevent job termination on node lookup fail
Change IsNodeDrain() and IsNodeDrained() to fail-closed behavior.
Previously, these functions returned true (drained) when Slurm node
lookups failed, causing immediate pod deletion and job termination.
Now returns false and propagates errors, allowing the operator to
retry until it can properly verify drain status. This protects
running jobs from premature termination during transient failures
or node name mismatches.1 parent 812ad79 commit 319dd4c
File tree
3 files changed
+67
-11
lines changed- internal/controller/nodeset
- slurmcontrol
3 files changed
+67
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
864 | 864 | | |
865 | 865 | | |
866 | 866 | | |
867 | | - | |
| 867 | + | |
| 868 | + | |
868 | 869 | | |
869 | 870 | | |
870 | 871 | | |
| |||
1054 | 1055 | | |
1055 | 1056 | | |
1056 | 1057 | | |
1057 | | - | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
1058 | 1061 | | |
1059 | 1062 | | |
1060 | 1063 | | |
| |||
1897 | 1900 | | |
1898 | 1901 | | |
1899 | 1902 | | |
1900 | | - | |
| 1903 | + | |
| 1904 | + | |
1901 | 1905 | | |
1902 | 1906 | | |
1903 | 1907 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
283 | | - | |
| 283 | + | |
284 | 284 | | |
285 | 285 | | |
286 | 286 | | |
287 | 287 | | |
288 | 288 | | |
289 | | - | |
290 | | - | |
291 | | - | |
292 | 289 | | |
293 | 290 | | |
294 | 291 | | |
| |||
304 | 301 | | |
305 | 302 | | |
306 | 303 | | |
307 | | - | |
| 304 | + | |
308 | 305 | | |
309 | 306 | | |
310 | 307 | | |
311 | 308 | | |
312 | 309 | | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | 310 | | |
317 | 311 | | |
318 | 312 | | |
| |||
Lines changed: 58 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
465 | 465 | | |
466 | 466 | | |
467 | 467 | | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
468 | 497 | | |
469 | 498 | | |
470 | 499 | | |
| |||
750 | 779 | | |
751 | 780 | | |
752 | 781 | | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
753 | 811 | | |
754 | 812 | | |
755 | 813 | | |
| |||
0 commit comments