-
Notifications
You must be signed in to change notification settings - Fork 74
Open
Labels
bugSomething isn't workingSomething isn't working
Description
In current master, two tests fail if run in parallel:
69/70 Testing: xshseqr
69/70 Test: xshseqr
Command: "/sw/env/gcc-10.3.0/openmpi/4.1.1/bin/mpiexec" "-n" "2" "./xshseqr"
Directory: /home/rrztest/src/scalapack/TESTING
"xshseqr" start time: Jul 25 20:04 CEST
Output:
----------------------------------------------------------
ScaLAPACK Test for PSHSEQR
epsilon = 5.96046448E-08
threshold = 30.0000000
Residual and Orthogonality Residual computed by:
Residual = || T - Q^T*A*Q ||_F / ( ||A||_F * eps * sqrt(N) )
Orthogonality = MAX( || I - Q^T*Q ||_F, || I - Q*Q^T ||_F ) / (eps * N)
Test passes if both residuals are less then threshold
N NB P Q QR Time CHECK
----- --- ---- ---- -------- ------
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x151fa27c93ff in ???
#1 0x151fa455124f in pstrord_
at /home/rrztest/src/scalapack/SRC/pstrord.f:1087
#2 0x151fa457a300 in pslaqr3_
at /home/rrztest/src/scalapack/SRC/pslaqr3.f:880
#3 0x151fa4565178 in pslaqr0_
at /home/rrztest/src/scalapack/SRC/pslaqr0.f:598
#4 0x151fa456209d in pshseqr_
at /home/rrztest/src/scalapack/SRC/pshseqr.f:441
#5 0x4036cf in pshseqrdriver
at /home/rrztest/src/scalapack/TESTING/EIG/pshseqrdriver.f:413
#6 0x404427 in main
at /home/rrztest/src/scalapack/TESTING/EIG/pshseqrdriver.f:565
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node node002 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
<end of output>
Test time = 2.91 sec
----------------------------------------------------------
Test Failed.
"xshseqr" end time: Jul 25 20:04 CEST
"xshseqr" time elapsed: 00:00:02
----------------------------------------------------------
70/70 Testing: xdhseqr
70/70 Test: xdhseqr
Command: "/sw/env/gcc-10.3.0/openmpi/4.1.1/bin/mpiexec" "-n" "2" "./xdhseqr"
Directory: /home/rrztest/src/scalapack/TESTING
"xdhseqr" start time: Jul 25 20:04 CEST
Output:
----------------------------------------------------------
ScaLAPACK Test for PDHSEQR
epsilon = 1.1102230246251565E-016
threshold = 30.000000000000000
Residual and Orthogonality Residual computed by:
Residual = || T - Q^T*A*Q ||_F / ( ||A||_F * eps * sqrt(N) )
Orthogonality = MAX( || I - Q^T*Q ||_F, || I - Q*Q^T ||_F ) / (eps * N)
Test passes if both residuals are less then threshold
N NB P Q QR Time CHECK
----- --- ---- ---- -------- ------
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x1488be0113ff in ???
#1 0x1488bff4ebae in pdtrord_
at /home/rrztest/src/scalapack/SRC/pdtrord.f:1087
#2 0x1488bff77f2f in pdlaqr3_
at /home/rrztest/src/scalapack/SRC/pdlaqr3.f:878
#3 0x1488bff62d2b in pdlaqr0_
at /home/rrztest/src/scalapack/SRC/pdlaqr0.f:598
#4 0x1488bff5fc1d in pdhseqr_
at /home/rrztest/src/scalapack/SRC/pdhseqr.f:441
#5 0x4036e2 in pdhseqrdriver
at /home/rrztest/src/scalapack/TESTING/EIG/pdhseqrdriver.f:412
#6 0x404445 in main
at /home/rrztest/src/scalapack/TESTING/EIG/pdhseqrdriver.f:564
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node node002 exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
<end of output>
Test time = 2.70 sec
----------------------------------------------------------
Test Failed.
"xdhseqr" end time: Jul 25 20:04 CEST
"xdhseqr" time elapsed: 00:00:02
----------------------------------------------------------
End testing: Jul 25 20:04 CEST
Both tests pass fine with -n 1
. I tested on two machines with differing compilers and MPI versions (4.1.1 and 1.10.7).
I observe weirdly long runtimes (hundreds of seconds) for some 2.2.0 tests when run inside the pkgsrc build framework, but they do succeed eventually. These FPEs are more definite.
e-kwsm
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working