1.4.0 release (#365)

sunqm · web-flow · commit 9a220536ebaf · 2025-04-07T19:40:06.000-07:00
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,8 +1,28 @@
+v1.4.0 (2025-03-27)
+-------------------
+* New Features
+  - RKS and UKS TDDFT Gradients for density fitting and direct-SCF methods.
+  - ECP integrals and its first and second derivatives accelerated on GPU.
+  - Multigrid algorithm for Coulomb matrix and LDA, GGA, MGGA functionals computation.
+  - PBC Gaussian density fitting integrals.
+  - ASE interface for molecular systems.
+* Improvements
+  - Reduce memory footprint in SCF driver.
+  - Reduce memory requirements for PCM energy and gradients.
+  - Reduce memory requirements for DFT gradients.
+  - Utilize the sparsity in cart2sph coefficients in the cart2sph transformation in scf.jk kernel
+  - Molecular 3c2e integrals generated using the block-divergent alogrithm.
+  - Support I orbitals in DFT.
+* Fixes
+  - LRU cached cart2sph under the multiple GPU environment.
+  - A maxDynamicSharedMemorySize setting bug in gradient and hessian calculation under the multiple GPU environment.
+  - Remove the limits of 6000 GTO shells in DFT numerical integration module.
+
 v1.3.2 (2025-03-10)
 -------------------
 * Improvements
   - Dump xc info and grids into to log file
-  - Optimize 4-center integral evalulation CUDA kernels using warp divergent algorithm
+  - Optimize 4-center integral evaluation CUDA kernels using warp divergent algorithm
   - Support up to I orbitals in DFT
   - Fix out-of-bound issue in DFT hessian for heavy atoms (>=19)
 * Deprecation
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -45,8 +45,8 @@ adding features to the library for PySCF functions running on GPU devices.
 * While examples or documentation are not mandatory, it is highly recommended to
   include examples of how to invoke the new module.
 
-* CUDA compute capability 60 (sm_60) is required. Please avoid using features
-  that are only available on CUDA compute capability 70 or newer. The CUDA code
+* CUDA compute capability 70 (sm_70) is required. Please avoid using features
+  that are only available on CUDA compute capability 80 or newer. The CUDA code
   should be compiled and run using CUDA 11 and CUDA 12 toolkits.
 
 Thank you for your contributions!
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ Installation
 --------
 
 > [!NOTE]
-> The compiled binary packages support compute capability 6.0 and later (Pascal and later, such as Tesla P100, RTX 10 series and later).
+> The compiled binary packages support compute capability 7.0 and later (Volta and later, such as Tesla V100, RTX 20 series and later).
 
 Run ```nvcc --version``` in your terminal to check the installed CUDA toolkit version. Then, choose the proper package based on your CUDA toolkit version.
 
@@ -27,7 +27,7 @@ cmake --build build/temp.gpu4pyscf -j 4
 CURRENT_PATH=`pwd`
 export PYTHONPATH="${PYTHONPATH}:${CURRENT_PATH}"
 ```
-Then install cutensor and cupy for acceleration (please switch the versions according to your nvcc version!)
+Then install cutensor and cupy for acceleration (please switch the versions according to your runtime CUDA environment!)
 ```sh
 pip3 install cutensor-cu12 cupy-cuda12x
 ```
@@ -45,17 +45,21 @@ Features
 - LDA, GGA, mGGA, hybrid, and range-separated functionals via [libXC](https://gitlab.com/libxc/libxc/-/tree/master/);
 - Spin-conserved and spin-flip TDA and TDDFT for excitated states
 - Geometry optimization and transition state search via [geomeTRIC](https://geometric.readthedocs.io/en/latest/);
+- Atomic Simulation Environment ([ASE](https://gitlab.com/ase/ase)) interface;
 - Dispersion corrections via [DFTD3](https://github.com/dftd3/simple-dftd3) and [DFTD4](https://github.com/dftd4/dftd4);
 - Nonlocal functional correction (vv10) for SCF and gradient;
-- ECP is supported and calculated on CPU;
-- PCM models, SMD model, their analytical gradients, and semi-analytical Hessian matrix;
+- ECP is supported and calculated on GPU;
+- PCM models, their analytical gradients, and analytical Hessian matrix;
+- SMD solvent model;
 - Unrestricted Hartree-Fock and unrestricted DFT, gradient, and Hessian;
-- MP2/DF-MP2 and CCSD (experimental);
-- Polarizability, IR, and NMR shielding (experimental);
-- QM/MM with PBC;
 - CHELPG, ESP, and RESP atomic charge;
-- Multi-GPU for both direct SCF and density fitting (experimental)
-- SCF and DFT with periodic boundary condition (experimental)
+
+The following features are still in the experimental stage
+- MP2/DF-MP2 and CCSD;
+- Polarizability, IR, and NMR shielding;
+- QM/MM with PBC;
+- Multi-GPU for both direct SCF and density fitting
+- SCF and DFT with periodic boundary condition
 
 Limitations
 --------
@@ -65,6 +69,7 @@ Limitations
 - Density fitting scheme up to ~168 atoms with def2-tzvpd basis, bounded by CPU memory;
 - meta-GGA without density laplacian;
 - Double hybrid functionals are not supported;
+- Hessian of TDDFT is not supported;
 
 Examples
 --------
@@ -113,7 +118,7 @@ Find more examples in [gpu4pyscf/examples](https://github.com/pyscf/gpu4pyscf/tr
 
 Benchmarks
 --------
-Speedup with GPU4PySCF v0.6.0 on A100-80G over Q-Chem 6.1 on 32-cores CPU (Desity fitting, SCF, def2-tzvpp, def2-universal-jkfit, B3LYP, (99,590))
+Speedup with GPU4PySCF v0.6.0 on A100-80G over Q-Chem 6.1 on 32-cores CPU (density fitting, SCF, def2-tzvpp, def2-universal-jkfit, B3LYP, (99,590))
 
 | mol               |   natm |    LDA |    PBE |   B3LYP |    M06 |   wB97m-v |
 |:------------------|-------:|-------:|-------:|--------:|-------:|----------:|
diff --git a/gpu4pyscf/__init__.py b/gpu4pyscf/__init__.py
@@ -12,6 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-__version__ = '1.3.2'
+__version__ = '1.4.0'
 
 from . import lib, grad, hessian, solvent, scf, dft, tdscf
diff --git a/gpu4pyscf/lib/gvhf-rys/rys_jk_driver.cu b/gpu4pyscf/lib/gvhf-rys/rys_jk_driver.cu
@@ -135,8 +135,8 @@ int RYS_build_j(double *vj, double *dm, int n_dm, int nao,
                     printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
                 }
                 if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                    printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                    fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                    printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                    fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
                 }
             }
 
@@ -152,8 +152,8 @@ int RYS_build_j(double *vj, double *dm, int n_dm, int nao,
                     printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
                 }
                 if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                    printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                    fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                    printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                    fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
                 }
             }
 
@@ -168,7 +168,7 @@ int RYS_build_j(double *vj, double *dm, int n_dm, int nao,
             printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
         }
         printf("CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
-        fprintf(stderr, "CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
+        fprintf(stderr, "CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
         return 1;
     }
     return 0;
@@ -234,8 +234,8 @@ int RYS_build_jk(double *vj, double *vk, double *dm, int n_dm, int nao,
                 printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
             }
             if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
             }
         }
 
@@ -249,7 +249,7 @@ int RYS_build_jk(double *vj, double *vk, double *dm, int n_dm, int nao,
             printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
         }
         printf("CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
-        fprintf(stderr, "CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
+        fprintf(stderr, "CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
         return 1;
     }
     return 0;
@@ -314,8 +314,8 @@ int RYS_build_jk_ip1(double *vj, double *vk, double *dm, int n_dm, int nao, int
                 printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
             }
             if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
             }
         }
 
@@ -329,7 +329,7 @@ int RYS_build_jk_ip1(double *vj, double *vk, double *dm, int n_dm, int nao, int
             printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
         }
         printf("CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
-        fprintf(stderr, "CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
+        fprintf(stderr, "CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
         return 1;
     }
     return 0;
@@ -400,8 +400,8 @@ int RYS_per_atom_jk_ip1(double *ejk, double j_factor, double k_factor,
                 printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
             }
             if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
             }
         }
 
@@ -416,7 +416,7 @@ int RYS_per_atom_jk_ip1(double *ejk, double j_factor, double k_factor,
             printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
         }
         printf("CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
-        fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
+        fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
         return 1;
     }
     return 0;
@@ -486,8 +486,8 @@ int RYS_per_atom_jk_ip2_type12(double *ejk, double j_factor, double k_factor,
                 printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
             }
             if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
             }
         }
 
@@ -502,7 +502,7 @@ int RYS_per_atom_jk_ip2_type12(double *ejk, double j_factor, double k_factor,
             printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
         }
         printf("CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
-        fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
+        fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
         return 1;
     }
     return 0;
@@ -573,8 +573,8 @@ int RYS_per_atom_jk_ip2_type3(double *ejk, double j_factor, double k_factor,
                 printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
             }
             if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
-                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
-                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
+                fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
             }
         }
 
@@ -589,7 +589,7 @@ int RYS_per_atom_jk_ip2_type3(double *ejk, double j_factor, double k_factor,
             printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
         }
         printf("CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
-        fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
+        fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
         return 1;
     }
     return 0;

Original file line number	Diff line number	Diff line change
`@@ -135,8 +135,8 @@ int RYS_build_j(double vj, double dm, int n_dm, int nao,`
`135`	`135`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`136`	`136`	`}`
`137`	`137`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`138`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`139`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`138`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`139`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`140`	`140`	`}`
`141`	`141`	`}`
`142`	`142`
`@@ -152,8 +152,8 @@ int RYS_build_j(double vj, double dm, int n_dm, int nao,`
`152`	`152`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`153`	`153`	`}`
`154`	`154`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`155`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`156`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`155`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`156`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`157`	`157`	`}`
`158`	`158`	`}`
`159`	`159`
`@@ -168,7 +168,7 @@ int RYS_build_j(double vj, double dm, int n_dm, int nao,`
`168`	`168`	`printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);`
`169`	`169`	`}`
`170`	`170`	`printf("CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
`171`		`- fprintf(stderr, "CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
	`171`	`+ fprintf(stderr, "CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);`
`172`	`172`	`return 1;`
`173`	`173`	`}`
`174`	`174`	`return 0;`
`@@ -234,8 +234,8 @@ int RYS_build_jk(double vj, double vk, double *dm, int n_dm, int nao,`
`234`	`234`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`235`	`235`	`}`
`236`	`236`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`237`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`238`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`237`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`238`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`239`	`239`	`}`
`240`	`240`	`}`
`241`	`241`
`@@ -249,7 +249,7 @@ int RYS_build_jk(double vj, double vk, double *dm, int n_dm, int nao,`
`249`	`249`	`printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);`
`250`	`250`	`}`
`251`	`251`	`printf("CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
`252`		`- fprintf(stderr, "CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
	`252`	`+ fprintf(stderr, "CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);`
`253`	`253`	`return 1;`
`254`	`254`	`}`
`255`	`255`	`return 0;`
`@@ -314,8 +314,8 @@ int RYS_build_jk_ip1(double vj, double vk, double *dm, int n_dm, int nao, int`
`314`	`314`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`315`	`315`	`}`
`316`	`316`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`317`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`318`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`317`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`318`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`319`	`319`	`}`
`320`	`320`	`}`
`321`	`321`
`@@ -329,7 +329,7 @@ int RYS_build_jk_ip1(double vj, double vk, double *dm, int n_dm, int nao, int`
`329`	`329`	`printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);`
`330`	`330`	`}`
`331`	`331`	`printf("CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
`332`		`- fprintf(stderr, "CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
	`332`	`+ fprintf(stderr, "CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);`
`333`	`333`	`return 1;`
`334`	`334`	`}`
`335`	`335`	`return 0;`
`@@ -400,8 +400,8 @@ int RYS_per_atom_jk_ip1(double *ejk, double j_factor, double k_factor,`
`400`	`400`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`401`	`401`	`}`
`402`	`402`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`403`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`404`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`403`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`404`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`405`	`405`	`}`
`406`	`406`	`}`
`407`	`407`
`@@ -416,7 +416,7 @@ int RYS_per_atom_jk_ip1(double *ejk, double j_factor, double k_factor,`
`416`	`416`	`printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);`
`417`	`417`	`}`
`418`	`418`	`printf("CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
`419`		`- fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
	`419`	`+ fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);`
`420`	`420`	`return 1;`
`421`	`421`	`}`
`422`	`422`	`return 0;`
`@@ -486,8 +486,8 @@ int RYS_per_atom_jk_ip2_type12(double *ejk, double j_factor, double k_factor,`
`486`	`486`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`487`	`487`	`}`
`488`	`488`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`489`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`490`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`489`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`490`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`491`	`491`	`}`
`492`	`492`	`}`
`493`	`493`
`@@ -502,7 +502,7 @@ int RYS_per_atom_jk_ip2_type12(double *ejk, double j_factor, double k_factor,`
`502`	`502`	`printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);`
`503`	`503`	`}`
`504`	`504`	`printf("CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
`505`		`- fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
	`505`	`+ fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);`
`506`	`506`	`return 1;`
`507`	`507`	`}`
`508`	`508`	`return 0;`
`@@ -573,8 +573,8 @@ int RYS_per_atom_jk_ip2_type3(double *ejk, double j_factor, double k_factor,`
`573`	`573`	`printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);`
`574`	`574`	`}`
`575`	`575`	`if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {`
`576`		`- printf("Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
`577`		`- fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`576`	`+ printf("Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);`
	`577`	`+ fprintf(stderr, "Dynamic shared memory size in used (buflensizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflensizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);`
`578`	`578`	`}`
`579`	`579`	`}`
`580`	`580`
`@@ -589,7 +589,7 @@ int RYS_per_atom_jk_ip2_type3(double *ejk, double j_factor, double k_factor,`
`589`	`589`	`printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);`
`590`	`590`	`}`
`591`	`591`	`printf("CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
`592`		`- fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);`
	`592`	`+ fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);`
`593`	`593`	`return 1;`
`594`	`594`	`}`
`595`	`595`	`return 0;`