Skip to content

Saving LPython's intrinsic modules as pyc files #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 24, 2022

Conversation

czgdp1807
Copy link
Collaborator

@czgdp1807 czgdp1807 commented Aug 20, 2022

#992 (comment)

For now I just save the ASR of a Python file compiled with -c option in .pyc files. Next steps,

  1. Add the possibility first try loading the module from .pyc file generated earlier. If not found then compile and re-save in .pyc file.
  2. Update CMake build files to compile intrinsic modules and save their ASR during LPython's build time.
  3. Test the update end to end.

@czgdp1807
Copy link
Collaborator Author

@certik Is this PR a good approach for implementing #992 (comment)?

b.write_string(serialize(m));

return b.get_str();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge this with LFortran --- I think all LCompilers can then reuse it, just change the extension. I think there is nothing special about LPython's .pyc compared to LFortran's .mod, I think they can be exactly identical. It's just ASR that is saved, it's independent of the frontend.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one thing different though. Fortran files have specific module symbols. Only those are saved in the .mod files. However in the case of Python, whole file can be a module. We just have to import something from it and it becomes a module. But if you call it using python command then it acts as an "executable". So what I have done is if any file is compiled with -c option of LPython then its ASR (i.e., the full ASR::TranslationUnit_t) gets saved in a .pyc file. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss it more. What I had in mind is that a single Python file is exactly 100% equivalent to a single Fortran module file, at the ASR level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Well, there is a difference that Python modules can be nested, but this is something that people have requested for Fortran also to do, so we should allow that at the ASR level in some clean way.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Let me try out implementing the rest of the steps I have written in #999 (comment). Let's see if it works till the end.

@czgdp1807 czgdp1807 added the asr ASR related changes label Aug 20, 2022
@czgdp1807 czgdp1807 force-pushed the mod01 branch 2 times, most recently from 343f757 to bcbee61 Compare August 22, 2022 10:17
@czgdp1807
Copy link
Collaborator Author

@certik Could you please review this? So far I have added support for saving LPython's generated ASR into .pyc files when -c and --disable-options are enabled. If these changes look good to you then I will proceed forward with compiling intrinsic modules at build time of LPython.

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine.

My only concern is with the duplication, since we rather want to extend / amend ASR and whatever is needed, and then exactly reuse the saving / loading functionality. It will benefit both LPython and LFortran. In fact, I think we could even mix and match ASR modules, that is, take an LPython .pyc module and "use" it from LFortran and vice versa. It's a little hard to wrap the head around it, but in libasr it is "just" ASR, it is not Python and it is not Fortran; or equivalently, it is both Python and Fortran. So rather than saving Python/Fortran, I want to think about it as saving ASR as a module. That's it. So it should be exactly the same code, and if it is not, that would be a concern.

@czgdp1807
Copy link
Collaborator Author

Yes. I agree. I will try to combine the functions (or as much as is common between them) here. Rest I will try to leave the scope of handling differences if any.

@certik
Copy link
Contributor

certik commented Aug 23, 2022

Regarding the behavior --- should LPython always save .pyc files (as CPython I think does)? Or should we make it configurable? Since LFortran always saves .mod files and if CPython also always saves .pyc files then perhaps we should also always save it. Does CPython decide based on the timestamp of the .py file if the .pyc file needs to be recompiled? LFortran never checks any timestamps, it just expects a .mod file and if it is there it will use it. It is the job of the build system to ensure the .mod file is up-to-date. CPython, on the other hand, does not require a build system.

It seems we almost need two modes:

  • The CPython like mode which does not require a build system; there are two sub-cases:
    • Use .pyc and check timestamps
    • No .pyc and it will always recompile everything (that is the current mode in master)
  • A build system mode, where we do not check timestamps and just require .pyc to be present if we import a module (we do not care about the .py files at all of imported modules) --- that is effectively the LFortran mode.

@czgdp1807
Copy link
Collaborator Author

By timestamp you mean two things right, the time at which pyc file was generated and the time at which the py was last modified. If the second happened after the first then recompile, right? Makes. Sense. Well if LPython is called by the build system then LPython's timestamp verification would be redundant. So, we need a way to distinguish between whether LPython was called by the user or the build system.

@certik
Copy link
Contributor

certik commented Aug 23, 2022

Yes regarding timestamp.

Yes, we need to distinguish the two (or three) modes above.

For example lpython a.py is the CPython mode (either .pyc or not). And lpython -c a.py -o a.o is the build system mode.

@certik
Copy link
Contributor

certik commented Aug 23, 2022

Regarding mix and matching LPython / LFortran, we will (later) version libasr itself, and just use the libasr version in the .pyc / .mod file. If the version exactly matches, then we can mix and match.

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. Further improvements can be done once this is merged.

@czgdp1807 czgdp1807 enabled auto-merge August 24, 2022 04:56
@czgdp1807 czgdp1807 merged commit ad9c9e9 into lcompilers:main Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asr ASR related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants