Skip to content

Commit 9a22053

Browse files
authored
1.4.0 release (#365)
1 parent 3017839 commit 9a22053

File tree

5 files changed

+59
-34
lines changed

5 files changed

+59
-34
lines changed

CHANGELOG

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,28 @@
1+
v1.4.0 (2025-03-27)
2+
-------------------
3+
* New Features
4+
- RKS and UKS TDDFT Gradients for density fitting and direct-SCF methods.
5+
- ECP integrals and its first and second derivatives accelerated on GPU.
6+
- Multigrid algorithm for Coulomb matrix and LDA, GGA, MGGA functionals computation.
7+
- PBC Gaussian density fitting integrals.
8+
- ASE interface for molecular systems.
9+
* Improvements
10+
- Reduce memory footprint in SCF driver.
11+
- Reduce memory requirements for PCM energy and gradients.
12+
- Reduce memory requirements for DFT gradients.
13+
- Utilize the sparsity in cart2sph coefficients in the cart2sph transformation in scf.jk kernel
14+
- Molecular 3c2e integrals generated using the block-divergent alogrithm.
15+
- Support I orbitals in DFT.
16+
* Fixes
17+
- LRU cached cart2sph under the multiple GPU environment.
18+
- A maxDynamicSharedMemorySize setting bug in gradient and hessian calculation under the multiple GPU environment.
19+
- Remove the limits of 6000 GTO shells in DFT numerical integration module.
20+
121
v1.3.2 (2025-03-10)
222
-------------------
323
* Improvements
424
- Dump xc info and grids into to log file
5-
- Optimize 4-center integral evalulation CUDA kernels using warp divergent algorithm
25+
- Optimize 4-center integral evaluation CUDA kernels using warp divergent algorithm
626
- Support up to I orbitals in DFT
727
- Fix out-of-bound issue in DFT hessian for heavy atoms (>=19)
828
* Deprecation

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,8 @@ adding features to the library for PySCF functions running on GPU devices.
4545
* While examples or documentation are not mandatory, it is highly recommended to
4646
include examples of how to invoke the new module.
4747

48-
* CUDA compute capability 60 (sm_60) is required. Please avoid using features
49-
that are only available on CUDA compute capability 70 or newer. The CUDA code
48+
* CUDA compute capability 70 (sm_70) is required. Please avoid using features
49+
that are only available on CUDA compute capability 80 or newer. The CUDA code
5050
should be compiled and run using CUDA 11 and CUDA 12 toolkits.
5151

5252
Thank you for your contributions!

README.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Installation
77
--------
88

99
> [!NOTE]
10-
> The compiled binary packages support compute capability 6.0 and later (Pascal and later, such as Tesla P100, RTX 10 series and later).
10+
> The compiled binary packages support compute capability 7.0 and later (Volta and later, such as Tesla V100, RTX 20 series and later).
1111
1212
Run ```nvcc --version``` in your terminal to check the installed CUDA toolkit version. Then, choose the proper package based on your CUDA toolkit version.
1313

@@ -27,7 +27,7 @@ cmake --build build/temp.gpu4pyscf -j 4
2727
CURRENT_PATH=`pwd`
2828
export PYTHONPATH="${PYTHONPATH}:${CURRENT_PATH}"
2929
```
30-
Then install cutensor and cupy for acceleration (please switch the versions according to your nvcc version!)
30+
Then install cutensor and cupy for acceleration (please switch the versions according to your runtime CUDA environment!)
3131
```sh
3232
pip3 install cutensor-cu12 cupy-cuda12x
3333
```
@@ -45,17 +45,21 @@ Features
4545
- LDA, GGA, mGGA, hybrid, and range-separated functionals via [libXC](https://gitlab.com/libxc/libxc/-/tree/master/);
4646
- Spin-conserved and spin-flip TDA and TDDFT for excitated states
4747
- Geometry optimization and transition state search via [geomeTRIC](https://geometric.readthedocs.io/en/latest/);
48+
- Atomic Simulation Environment ([ASE](https://gitlab.com/ase/ase)) interface;
4849
- Dispersion corrections via [DFTD3](https://github.com/dftd3/simple-dftd3) and [DFTD4](https://github.com/dftd4/dftd4);
4950
- Nonlocal functional correction (vv10) for SCF and gradient;
50-
- ECP is supported and calculated on CPU;
51-
- PCM models, SMD model, their analytical gradients, and semi-analytical Hessian matrix;
51+
- ECP is supported and calculated on GPU;
52+
- PCM models, their analytical gradients, and analytical Hessian matrix;
53+
- SMD solvent model;
5254
- Unrestricted Hartree-Fock and unrestricted DFT, gradient, and Hessian;
53-
- MP2/DF-MP2 and CCSD (experimental);
54-
- Polarizability, IR, and NMR shielding (experimental);
55-
- QM/MM with PBC;
5655
- CHELPG, ESP, and RESP atomic charge;
57-
- Multi-GPU for both direct SCF and density fitting (experimental)
58-
- SCF and DFT with periodic boundary condition (experimental)
56+
57+
The following features are still in the experimental stage
58+
- MP2/DF-MP2 and CCSD;
59+
- Polarizability, IR, and NMR shielding;
60+
- QM/MM with PBC;
61+
- Multi-GPU for both direct SCF and density fitting
62+
- SCF and DFT with periodic boundary condition
5963

6064
Limitations
6165
--------
@@ -65,6 +69,7 @@ Limitations
6569
- Density fitting scheme up to ~168 atoms with def2-tzvpd basis, bounded by CPU memory;
6670
- meta-GGA without density laplacian;
6771
- Double hybrid functionals are not supported;
72+
- Hessian of TDDFT is not supported;
6873

6974
Examples
7075
--------
@@ -113,7 +118,7 @@ Find more examples in [gpu4pyscf/examples](https://github.com/pyscf/gpu4pyscf/tr
113118

114119
Benchmarks
115120
--------
116-
Speedup with GPU4PySCF v0.6.0 on A100-80G over Q-Chem 6.1 on 32-cores CPU (Desity fitting, SCF, def2-tzvpp, def2-universal-jkfit, B3LYP, (99,590))
121+
Speedup with GPU4PySCF v0.6.0 on A100-80G over Q-Chem 6.1 on 32-cores CPU (density fitting, SCF, def2-tzvpp, def2-universal-jkfit, B3LYP, (99,590))
117122

118123
| mol | natm | LDA | PBE | B3LYP | M06 | wB97m-v |
119124
|:------------------|-------:|-------:|-------:|--------:|-------:|----------:|

gpu4pyscf/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,6 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
__version__ = '1.3.2'
15+
__version__ = '1.4.0'
1616

1717
from . import lib, grad, hessian, solvent, scf, dft, tdscf

gpu4pyscf/lib/gvhf-rys/rys_jk_driver.cu

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,8 @@ int RYS_build_j(double *vj, double *dm, int n_dm, int nao,
135135
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
136136
}
137137
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
138-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
139-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
138+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
139+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
140140
}
141141
}
142142

@@ -152,8 +152,8 @@ int RYS_build_j(double *vj, double *dm, int n_dm, int nao,
152152
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
153153
}
154154
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
155-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
156-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
155+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
156+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
157157
}
158158
}
159159

@@ -168,7 +168,7 @@ int RYS_build_j(double *vj, double *dm, int n_dm, int nao,
168168
printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
169169
}
170170
printf("CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
171-
fprintf(stderr, "CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
171+
fprintf(stderr, "CUDA Error in RYS_build_j, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
172172
return 1;
173173
}
174174
return 0;
@@ -234,8 +234,8 @@ int RYS_build_jk(double *vj, double *vk, double *dm, int n_dm, int nao,
234234
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
235235
}
236236
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
237-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
238-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
237+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
238+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
239239
}
240240
}
241241

@@ -249,7 +249,7 @@ int RYS_build_jk(double *vj, double *vk, double *dm, int n_dm, int nao,
249249
printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
250250
}
251251
printf("CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
252-
fprintf(stderr, "CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
252+
fprintf(stderr, "CUDA Error in RYS_build_jk, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
253253
return 1;
254254
}
255255
return 0;
@@ -314,8 +314,8 @@ int RYS_build_jk_ip1(double *vj, double *vk, double *dm, int n_dm, int nao, int
314314
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
315315
}
316316
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
317-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
318-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
317+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
318+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
319319
}
320320
}
321321

@@ -329,7 +329,7 @@ int RYS_build_jk_ip1(double *vj, double *vk, double *dm, int n_dm, int nao, int
329329
printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
330330
}
331331
printf("CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
332-
fprintf(stderr, "CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
332+
fprintf(stderr, "CUDA Error in RYS_build_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
333333
return 1;
334334
}
335335
return 0;
@@ -400,8 +400,8 @@ int RYS_per_atom_jk_ip1(double *ejk, double j_factor, double k_factor,
400400
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
401401
}
402402
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
403-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
404-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
403+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
404+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
405405
}
406406
}
407407

@@ -416,7 +416,7 @@ int RYS_per_atom_jk_ip1(double *ejk, double j_factor, double k_factor,
416416
printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
417417
}
418418
printf("CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
419-
fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
419+
fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip1, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
420420
return 1;
421421
}
422422
return 0;
@@ -486,8 +486,8 @@ int RYS_per_atom_jk_ip2_type12(double *ejk, double j_factor, double k_factor,
486486
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
487487
}
488488
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
489-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
490-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
489+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
490+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
491491
}
492492
}
493493

@@ -502,7 +502,7 @@ int RYS_per_atom_jk_ip2_type12(double *ejk, double j_factor, double k_factor,
502502
printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
503503
}
504504
printf("CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
505-
fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
505+
fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type12, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
506506
return 1;
507507
}
508508
return 0;
@@ -573,8 +573,8 @@ int RYS_per_atom_jk_ip2_type3(double *ejk, double j_factor, double k_factor,
573573
printf("Failed in cudaFuncGetAttributes(), attribute value is not reliable\n"); fflush(stdout);
574574
}
575575
if (buflen*sizeof(double) > attributes.maxDynamicSharedSizeBytes) {
576-
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
577-
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %d > set max value (attributes.maxDynamicSharedSizeBytes) = %d\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
576+
printf("Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stdout);
577+
fprintf(stderr, "Dynamic shared memory size in used (buflen*sizeof(double)) = %zu > set max value (attributes.maxDynamicSharedSizeBytes) = %zu\n", buflen*sizeof(double), attributes.maxDynamicSharedSizeBytes); fflush(stderr);
578578
}
579579
}
580580

@@ -589,7 +589,7 @@ int RYS_per_atom_jk_ip2_type3(double *ejk, double j_factor, double k_factor,
589589
printf("Failed also in cudaGetDevice(), device_id value is not reliable\n"); fflush(stdout);
590590
}
591591
printf("CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
592-
fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stdout);
592+
fprintf(stderr, "CUDA Error in RYS_per_atom_jk_ip2_type3, li,lj,lk,ll = %d,%d,%d,%d, device_id = %d, error message = %s\n", li,lj,lk,ll, device_id, cudaGetErrorString(err)); fflush(stderr);
593593
return 1;
594594
}
595595
return 0;

0 commit comments

Comments
 (0)