Skip to content

TREXIO + MCSCF wavefunction #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 17, 2025
Merged

TREXIO + MCSCF wavefunction #92

merged 17 commits into from
Aug 17, 2025

Conversation

NastaMauger
Copy link
Contributor

@NastaMauger NastaMauger commented Dec 20, 2024

Hello,

In this PR, you will find modifications that allow users to register an MCSCF function in the TREXIO format.

I am uncertain if additional data must be stored for TREXIO to function correctly with other software and to return the corresponding wave function produced by PySCF before registering it into TREXIO. However, the corresponding HDF5 file generated using the following code:

mcscf = mcscf.casci_symm.CASCI(rhf, norb, nelec)
mcscf.fcisolver = selected_ci_spin0_symm.SelectedCI(rhf)

mcscf.kernel()
trexio_file = 'data_mcscf.hdf5'
pyscf_trexio.to_trexio(mcscf, trexio_file)

pyscf_trexio.det_to_trexio(mcscf, norb, nelec, trexio_file)
eri = mol.intor('int2e')
pyscf_trexio.write_eri(eri, trexio_file)

seems to register the data correctly.

Please let me know if further modifications or improvements are necessary. I am also uncertain whether the one-electron MO group is essential, as mentione here. I would also appreciate any feedback on whether the write_eri function needs to be modified to handle CASCI/CASSCF, particularly for the corresponding active space

Best

@q-posev
Copy link

q-posev commented Jan 2, 2025

Thank you @NastaMauger !

I noticed that you do not register the CI determinants as part of your new mcscf_to_trexio. Is it intended?

@q-posev
Copy link

q-posev commented Jan 2, 2025

I guess the best test of the validity of the produced multi-configurational TREXIO wavefunction is via an attempt to import and use it in some software that handles such wave functions (e.g. QP2)? @NastaMauger @scemama

@@ -434,10 +464,6 @@ def det_to_trexio(mcscf, norb, nelec, filename, backend='h5', ci_threshold=0., c
with trexio.File(filename, 'u', back_end=_mode(backend)) as tf:
if trexio.has_determinant(tf):
trexio.delete_determinant(tf)
trexio.write_mo_num(tf, mo_num)
trexio.write_electron_up_num(tf, len(a))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? We need the number of up and down electrons to perform the CI determinants "correctness" check before writing them in the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled by the _mcscf_to_trexio function, which calls _mol_to_trexio.

Reshape write_eri function to ensusre conssitency between other codes
mcscf_function directly register the determinant list now
@NastaMauger
Copy link
Contributor Author

NastaMauger commented Feb 17, 2025

@q-posev

  • I noticed that you do not register the CI determinants as part of your new mcscf_to_trexio. Is it intended?
    Done

  • I guess the best test of the validity of the produced multi-configurational TREXIO wavefunction is via an attempt to import and use it in some software that handles such wave functions (e.g. QP2)? @NastaMauger @scemama
    There is still an issue due to the normalization factor, which needs to be resolved.

  • The write-eri function might still lack consistency between what the original function in pyscf-forge returns and what QP2 (or other software) might expect.

@q-posev
Copy link

q-posev commented Feb 17, 2025

@NastaMauger

For the ERIs - I agree, @kgasperich submitted a draft PR which takes care of the AO/MO inconsistency.

@NastaMauger
Copy link
Contributor Author

NastaMauger commented Mar 1, 2025

Hello,

I am following this issue as it's a feature I really want to add.

After discussing with @q-posev and @scemama, we observed that although we correctly registered all necessary data into TREXIO, there is still a difference between the CASSCF energy from PySCF and the exported TREXIO file. Upon examining the convergence of the energy with respect to the CAS space, it seems there may be a hidden selected CI procedure within the CASSCF routine.

@sunqm Could you please confirm and provide guidance on how to correctly register the CASCI wavefunction if some selected CI is implicitly performed?

Best

@MatthewRHermes
Copy link
Collaborator

Hello,

I am following this issue as it's a feature I really want to add.

After discussing with @q-posev and @scemama, we observed that although we correctly registered all necessary data into TREXIO, there is still a difference between the CASSCF energy from PySCF and the exported TREXIO file. Upon examining the convergence of the energy with respect to the CAS space, it seems there may be a hidden selected CI procedure within the CASSCF routine.

@sunqm Could you please confirm and provide guidance on how to correctly register the CASCI wavefunction if some selected CI is implicitly performed?

Best

There is no selected CI procedure in CASSCF that I am aware of. There is a procedure to approximate the CI step during the orbital optimization, but that should have no effect on the evaluation of the CASCI energy, which is entirely determined by the MO coefficients, one- and two-electron integrals, and CI vectors. Could you elaborate on what you are seeing?

@sunqm
Copy link
Contributor

sunqm commented Mar 8, 2025

Hello,

I am following this issue as it's a feature I really want to add.

After discussing with @q-posev and @scemama, we observed that although we correctly registered all necessary data into TREXIO, there is still a difference between the CASSCF energy from PySCF and the exported TREXIO file. Upon examining the convergence of the energy with respect to the CAS space, it seems there may be a hidden selected CI procedure within the CASSCF routine.

@sunqm Could you please confirm and provide guidance on how to correctly register the CASCI wavefunction if some selected CI is implicitly performed?

Best

Are you able to reproduce the CASSCF energy for small systems like H2 molecule with (2,2) CAS?

@NastaMauger
Copy link
Contributor Author

I think the differences I got stems for different version I was using. Now it should be ok

@NastaMauger NastaMauger requested review from sunqm and q-posev July 3, 2025 13:56
Copy link

@scemama scemama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert of PySCF, but on the TREXIO side everything looks OK.

Copy link

@q-posev q-posev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NastaMauger thank you!
However, test_trexio is failing in the CI. Have you tried to run pytest locally on your branch? Maybe it's because of the modifications in the mol_from_trexio?
In general, did you manage to reproduce the CI energy between QP2 and PySCF via TREXIO using the code in this branch? That could be a good test to add to test_trexio.py here if you or @scemama have a small CIPSI wavefunction in the TREXIO format.

@@ -329,6 +365,12 @@ def write_eri(eri, filename, backend='h5'):
idx[:,:,2:] = idx_pair[None,:,:]
idx = idx[np.tril_indices(npair)]

idx=idx.reshape((num_integrals,4))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to reshape the integrals here?

Copy link
Contributor Author

@NastaMauger NastaMauger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the difference between chemist and physicist notation.
QP2 uses the physicist's notation, whereas PySCF uses the chemist's.

There is a PR where Kevin is modifying this.

Copy link
Contributor Author

@NastaMauger NastaMauger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think there should be a possibility with TREXIO to register data in one format or another, and to have an HDF5 field indicating which format the data is stored in.

Copy link

@q-posev q-posev Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test this part?
I am asking because

  1. I see that you swap the indices to go from one notation into another, but you do not swap the eri array accordingly (which remains in the original chemist notation of PySCF).
  2. I see that read_eri part is untouched. So when reading the ERI's in physicist notation, one may encounter issues as PySCF may expect chemist notation. So the the index swap might be needed in read_eri too.

I guess one of these points leads to tests failing in the CI. But maybe I missed something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@q-posev This is something I worked on with Anthony during a Zoom call a while ago because we noticed some weird behavior. I'm not sure if it's still necessary, so I left it here as a comment just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your point 2: exactly. That's why I think something should be done in the TREXIO format to ensure that the formats are properly registered and can be used seamlessly across different software.

I never really had the opportunity to finish this, and I know Kevin is working on something similar on the QP2 side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small comment in case someone wants to follow this. If it turns out to be useless, I guess it can easily be removed by either you or Anthony in the next PR

Copy link

@q-posev q-posev Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we enforce one format or another in TREXIO, the user can still confuse them and accidentally export the wrong info. But we have trexio-tools package which can be used to validate some TREXIO fields and people can contribute more robust tests to the repo (e.g. computing the HF energy from the one- and two-electron integrals)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think there should be a possibility with TREXIO to register data in one format or another, and to have an HDF5 field indicating which format the data is stored in.

The whole point of TREXIO is to have a single and consistent way of doing things so the data is easy to interpret. We chose physicist's notation because chemist's notation can't generalize to more than 2-electron integrals. Reordering indices from 1,2,3,4 to 1,3,2,4 is not really a big issue...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@q-posev This is something I worked on with Anthony during a Zoom call a while ago because we noticed some weird behavior. I'm not sure if it's still necessary, so I left it here as a comment just in case.

In this zoom call we realized we had to reorder the indices because we checked the enrgy with quantum package. So we fixed the ordering for the export. I don't recall we checked importing integrals into pyscf. The reordering should also be done in the reading part.

from trexio_tools.group_tools import determinant as trexio_det

mo_num = norb
ncore = mcscf.ncore
int64_num = int((mo_num - 1) / 64) + 1
Copy link

@q-posev q-posev Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do int64_num = trexio.get_int64_num(trexio_file) here, the way @scemama did in PR #152

Copy link
Contributor Author

@NastaMauger NastaMauger Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user registers the determinant in a new/different HDF5 file, this will not work.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that in previous versions of trexio you could bypass the need of storing the number of MOs in the trexio file, but it was because some internal consistency checking was lacking in TREXIO. In the current version, you can't write determinants if you don't write the number of MOs in the MO group to ensure that the determinants you store are consistent. So it should be totally equivalent.

I would rather use the library function call than compute it, because it will always return the value expected by the library. Here it is possible that user makes a mistake, and that norb is not the same as the value stored in the mo group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess, I need to update TREXIO on my own laptop because int64_num = trexio.get_int64_num(trexio_file) does not work on my end

@NastaMauger
Copy link
Contributor Author

NastaMauger commented Jul 4, 2025

I have no clue why this CI failed. pytest succeeds on my laptop.

Copy link

@q-posev q-posev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NastaMauger thanks! Indeed, the CI was failing before because of the broken ERI import. Now it's fixed and the CI failed only for Python 3.12 for the reasons unknown to me and unrelated to TREXIO, which is good news :-)

So if you ask me, I think this PR can be merged, even though the ERIs are exported in the chemist notation and not in the physicist one at the moment. I believe this issue can be addressed in a separate PR.

What do you think @sunqm @MatthewRHermes @scemama ?

@NastaMauger
Copy link
Contributor Author

NastaMauger commented Jul 4, 2025

Ok @scemama, this is exactly what I was thinking. The correct ERIs are missing, and without them, it's useless to have a QP2 (or any other) test export.

I'm a bit confused about this part: "set the importing of ERIs as NotImplementedError".
Are you asking to create a function called import_eri that simply raises an error?

@NastaMauger NastaMauger requested a review from scemama July 4, 2025 21:43
Comment on lines 387 to 388
# x = idx[:,0]*(idx[:,0]+1)//2 + idx[:,1]
# y = idx[:,2]*(idx[:,2]+1)//2 + idx[:,3]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, NotImplementedError breaks the tests. So it was not a good idea.
You can reactivate the read_eri as it was before, replacing those two lines with

    x = idx[:,0]*(idx[:,0]+1)//2 + idx[:,2]
    y = idx[:,1]*(idx[:,1]+1)//2 + idx[:,3]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@NastaMauger NastaMauger requested a review from scemama July 5, 2025 19:30
@NastaMauger
Copy link
Contributor Author

@scemama, , are the modifications you requested included in my PR? For some reason, I can’t see your comments. :(

@q-posev
Copy link

q-posev commented Jul 5, 2025

TREXIO CI is green now! I think this PR is ready to be merged

@q-posev
Copy link

q-posev commented Jul 13, 2025

@sunqm We believe that this PR is finally ready to be merged. Would it be possible to get another round of review from one of the pyscf-forge maintainers?

@sunqm sunqm added this to the v1.0.4 milestone Jul 13, 2025
@NastaMauger
Copy link
Contributor Author

@sunqm I have no clue why version 3.12 is having issues with the check, since the error arises in a part I haven't modified at all.

@kousuke-nakano
Copy link
Contributor

@NastaMauger Hi. I would appreciate it if you could let me know the status of this pull request. I am working on a related topic. #153 Thank you!

@NastaMauger
Copy link
Contributor Author

@kousuke-nakano In fact, this PR is ready to be merged. It is currently blocked for an unknown reason
@sunqm ?

@@ -410,17 +452,18 @@ def get_occsa_and_occsb(mcscf, norb, nelec, ci_threshold=0.):

return occsa_sorted, occsb_sorted, ci_values_sorted, num_determinants

def det_to_trexio(mcscf, norb, nelec, filename, backend='h5', ci_threshold=0., chunk_size=100000):
def det_to_trexio(mcscf, norb, nelec, trexio_file, ci_threshold=0., chunk_size=100000):
from trexio_tools.group_tools import determinant as trexio_det
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @NastaMauger, in my opinion, we should not import trexio_tools; otherwise, pyscf has an additional dependency on trexio_tools. In other words, when we make a pull request to the main repository in the future, we should ask the committee member of pyscfto add not only trexio but also trexio_tools in the setup.cfg, which is not a good idea. We should hardcode it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to know your thoughts, @q-posev and @scemama.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to accept this PR as it is and open a new one with your idea? I agree that your suggestion would help reduce dependencies, but since this has already been validated by everyone and the only thing preventing the final merge is unrelated to the current modification, I would really appreciate if it could be merged

To be honest, I’m not sure how much work this would require, especially since this PR has been open for a while simply because I had forgotten about it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to know your thoughts, @q-posev and @scemama.

I agree. We should not import trexio tools.
Also, I recently removed the déterminant module from trexio tools because it was redundant with native functions of trexio.

@kousuke-nakano
Copy link
Contributor

@NastaMauger Yes, I agree with you, and I will create another pull request regarding the dependency issue.

@scemama
Copy link

scemama commented Jul 27, 2025

You can fix the dependency like this:
TREX-CoE/trexio_tools@739317f
It is in a pending PR of trexio_tools

@kousuke-nakano
Copy link
Contributor

You can fix the dependency like this: TREX-CoE/trexio_tools@739317f It is in a pending PR of trexio_tools

Thank you @scemama! Once this pull request is merged, I will create another one to address this issue.

@scemama
Copy link

scemama commented Jul 31, 2025

@NastaMauger I made a PR on your fork to fix the dependency issue with trexio_tools. So no other PR to pyscf-forge will be necessary.

@NastaMauger NastaMauger requested a review from sunqm July 31, 2025 13:32
@sunqm sunqm merged commit b4da987 into pyscf:master Aug 17, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants