Skip to content

Auto-saving and loading modules at compile time #1065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/lpython/pickle.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ class ASRTreeVisitor :
{
public:
bool show_intrinsic_modules;

std::string get_str() {
return s;
}
Expand Down
72 changes: 65 additions & 7 deletions src/lpython/semantics/python_ast_to_asr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
#include <complex>
#include <sstream>
#include <iterator>
#include <cstdio>
#include <cstdlib>

#include <libasr/asr.h>
#include <libasr/asr_utils.h>
Expand Down Expand Up @@ -162,9 +164,30 @@ namespace CastingUtil {
}

int save_pyc_files(const LFortran::ASR::TranslationUnit_t &u,
std::string infile) {
std::string infile) {
LFORTRAN_ASSERT(LFortran::asr_verify(u));
std::string modfile_binary = LFortran::save_pycfile(u);
Allocator al(4*1024);
LFortran::SymbolTable *symtab =
al.make_new<LFortran::SymbolTable>(nullptr);
std::vector<std::pair<ASR::Module_t*, SymbolTable*>> module_parent;
for (auto &item : u.m_global_scope->get_scope()) {
if (LFortran::ASR::is_a<LFortran::ASR::Module_t>(*item.second)) {
LFortran::ASR::Module_t *m = LFortran::ASR::down_cast<LFortran::ASR::Module_t>(item.second);

symtab->add_symbol(std::string(m->m_name), item.second);
module_parent.push_back(std::make_pair(m, m->m_symtab->parent));
m->m_symtab->parent = symtab;
}
}

LFortran::Location loc;
LFortran::ASR::asr_t *asr = LFortran::ASR::make_TranslationUnit_t(al, loc,
symtab, nullptr, 0);
LFortran::ASR::TranslationUnit_t *tu =
LFortran::ASR::down_cast2<LFortran::ASR::TranslationUnit_t>(asr);
LFORTRAN_ASSERT(LFortran::asr_verify(*tu));

std::string modfile_binary = LFortran::save_pycfile(*tu);
Comment on lines +167 to +190
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same approach as LFortran i.e., https://github.com/lfortran/lfortran/blob/f07bc66f885d4bc365be140b23a2b62f7e205d85/src/bin/lfortran.cpp#L609-L649 but still this issue persists. The reason I think is because we keep using the same process to compile the modules as well as the main program. However LFortran launches a different process to compile modules and a different one to compile the main program. If we want to follow the auto-compile, load and save approach in LPython then probably we shouldn't compare Symtab ID for exact matches while comparing the reference tests and the ASR output. @certik Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you can add an internal option to not use pre-compiled modules (i.e., ignore pyc files) but compile every-time while generating reference tests.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I think is because we keep using the same process to compile the modules as well as the main program.

I made some changes (see in 8317729 which is an experimental commit) where I launched a separate lpython process to compile modules and save them in pyc files. And it worked on my mac. Between consecutive clean builds reference tests matched. So I think the issue is not the one in #992 (comment). The real issue is modules getting compiled with other programs because of which the module's symbol tables are affected. If modules are compiled by an isolated lpython process then I don't see any such thing happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass with 8317729 (an experimental commit). So clearly the issue is what I explained in my above comment. Now system is not a safe way to compile modules in an isolated lpython process. Some better ways include,

I would say encapsulate the logic to safely compile a module in a API inside LPython and then call it everywhere. Current approach in main is very non-deterministic so we need to change it anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be using any kind of isolated process. We simply can always compile using a new class for compilation. We just have to handle the symbol table ID correctly, for example as a local variable, etc.

Furthermore, we should use exactly the same code with LFortran, not maintain two separate codes to load/save ASR to mod files.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, we should use exactly the same code with LFortran, not maintain two separate codes to load/save ASR to mod files.

The code and logic is exactly the same. The difference between LFortran and LPython is explained below,

The reason I think is because we keep using the same process to compile the modules as well as the main program. However LFortran launches a different process to compile modules and a different one to compile the main program.

Basically LFortran compiles modules according to the instructions in the build scripts (CMakeLists.txt). However LPython compiles modules on the fly. So a module compiled with one program will give different ASR when compiled along with another program. This is not the issue in LFortran because there we compile the modules, save them in .mod files before compiling the main program. In fact that's why such handling is not present in LFortran I think because LFortran doesn't compile modules on the fly.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's think what happens when we compile modules on the fly. Consider three files a.py, b.py and module.py. a.py and b.py import module.py.

Case 1

We execute, lpython a.py first and then lpython b.py. Since we compile modules on the fly during execution of lpython a.py, when we will reach the following call to compile module.py,

Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast,
diagnostics, false, true, false, infile);

some symbol tables would have been created already by then. Say n symbol tables were created. So the ID of first symbol table while compiling module.py will be n (symbol table counter starts from 0 and its global for a single lpython process). The same thing will be stored in module.pyc. So minimum symbol table ID in module.pyc will be n.

Now once the module.pyc is generated we will continue compiling a.py and its symbol table IDs will start from n + number_of_symtabs_in_module.

Case 2
After cleaning our repository, we execute lpython b.py first and then lpython a.py. Now again we will reach the following call,

Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast,
diagnostics, false, true, false, infile);

However this time (since we are compiling b.py) say m symbol tables were created till the above call. So now minimum symbol table ID in module.pyc will be m.

Now once the module.pyc is generated we will continue compiling b.py and its symbol table IDs will start from m + number_of_symtabs_in_module.

So you can see how compiling on the fly produces different ASRs for a.py and b.py because after compiling module.py on the fly, SymbolTable::counter continues from where it left after completing module.py. This affects the program's ASR as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the symbol table ID will differ based on who compiled it. This is not a problem, as the deserialization renumbers the Symbol IDs already. LFortran also has to (and does) it.


while( infile.back() != '.' ) {
infile.pop_back();
Expand All @@ -175,6 +198,10 @@ int save_pyc_files(const LFortran::ASR::TranslationUnit_t &u,
out.open(modfile, std::ofstream::out | std::ofstream::binary);
out << modfile_binary;
}

for( auto& mod_par: module_parent ) {
mod_par.first->m_symtab->parent = mod_par.second;
}
return 0;
}

Expand Down Expand Up @@ -255,10 +282,7 @@ ASR::TranslationUnit_t* compile_module_till_asr(Allocator& al,
lm.in_filename = infile;
Result<ASR::TranslationUnit_t*> r2 = python_ast_to_asr(al, *ast,
diagnostics, false, true, false, infile, "");
// TODO: Uncomment once a check is added for ensuring
// that module.py file hasn't changed between
// builds.
// save_pyc_files(*r2.result, infile + "c");
save_pyc_files(*r2.result, infile + "c");
std::string input;
read_file(infile, input);
CompilerOptions compiler_options;
Expand Down Expand Up @@ -286,6 +310,23 @@ void fill_module_dependencies(SymbolTable* symtab, std::set<std::string>& mod_de
}
}

bool is_compilation_needed(std::string file_path) {
struct stat result;
int64_t pyc_modtime = -1, py_modtime = -1;
if (stat(file_path.c_str(), &result) == 0) {
pyc_modtime = result.st_mtime;
}
file_path.pop_back();

if (stat(file_path.c_str(), &result) == 0) {
py_modtime = result.st_mtime;
}

return (pyc_modtime <= py_modtime) ||
py_modtime == -1 ||
pyc_modtime == -1;
}

ASR::Module_t* load_module(Allocator &al, SymbolTable *symtab,
const std::string &module_name,
const Location &loc, bool intrinsic,
Expand Down Expand Up @@ -323,7 +364,17 @@ ASR::Module_t* load_module(Allocator &al, SymbolTable *symtab,
found = set_module_path(infile0, rl_path, infile,
path_used, input, ltypes, enum_py);
} else {
mod1 = load_pycfile(al, input, false);
if( !is_compilation_needed(infile) ) {
mod1 = load_pycfile(al, input, false);
} else {
infile.pop_back();
mod1 = compile_module_till_asr(al, rl_path, infile, loc, err);
// std::string cmd = "lpython -c --disable-main " + infile;
// system(cmd.c_str());
// bool found = set_module_path(infile0c, rl_path, infile,
// path_used, input, ltypes, enum_py);
// mod1 = load_pycfile(al, input, false);
}
fix_external_symbols(*mod1, *ASRUtils::get_tu_symtab(symtab));
LFORTRAN_ASSERT(asr_verify(*mod1));
compile_module = false;
Expand All @@ -340,6 +391,13 @@ ASR::Module_t* load_module(Allocator &al, SymbolTable *symtab,

if( compile_module ) {
mod1 = compile_module_till_asr(al, rl_path, infile, loc, err);
// std::string cmd = "lpython -c --disable-main " + infile;
// system(cmd.c_str());
// bool found = set_module_path(infile0c, rl_path, infile,
// path_used, input, ltypes, enum_py);
// mod1 = load_pycfile(al, input, false);
fix_external_symbols(*mod1, *ASRUtils::get_tu_symtab(symtab));
LFORTRAN_ASSERT(asr_verify(*mod1));
}

// insert into `symtab`
Expand Down
10 changes: 10 additions & 0 deletions src/lpython/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@
#include <string>
#include <libasr/utils.h>

#include <sys/types.h>
#include <sys/stat.h>
#ifndef _WIN32
#include <unistd.h>
#endif

#ifdef _WIN32
#define stat _stat
#endif

namespace LFortran {

void get_executable_path(std::string &executable_path, int &dirname_length);
Expand Down
2 changes: 1 addition & 1 deletion tests/reference/asr-array_01_decl-39cf894.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"outfile": null,
"outfile_hash": null,
"stdout": "asr-array_01_decl-39cf894.stdout",
"stdout_hash": "5b68c8a68e32424bac605d693074439184e291066af6eeae1b231d19",
"stdout_hash": "5eaa80a717f7ec9ed11d653889b64497251af4ed9b2fe531d9d48a2a",
"stderr": null,
"stderr_hash": null,
"returncode": 0
Expand Down
2 changes: 1 addition & 1 deletion tests/reference/asr-array_01_decl-39cf894.stdout

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/reference/asr-array_02_decl-e8f6874.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"outfile": null,
"outfile_hash": null,
"stdout": "asr-array_02_decl-e8f6874.stdout",
"stdout_hash": "2cdc4579cf4108cf1e061b27a5eec251c0da225d1c7964671054fc54",
"stdout_hash": "0aafef017a432335f36dfd651ceef8374193898f9063696f4d46cd41",
"stderr": null,
"stderr_hash": null,
"returncode": 0
Expand Down
2 changes: 1 addition & 1 deletion tests/reference/asr-array_02_decl-e8f6874.stdout

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/reference/asr-cast-435c233.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"outfile": null,
"outfile_hash": null,
"stdout": "asr-cast-435c233.stdout",
"stdout_hash": "8fd9b47c25981ee4eee9c480b5321ed393ff3ce3dbb517ac1baa218b",
"stdout_hash": "98328ebb113c5f2d105ae04d03b0bfd6d60a4f3453d2ea4a75b136c0",
"stderr": null,
"stderr_hash": null,
"returncode": 0
Expand Down
2 changes: 1 addition & 1 deletion tests/reference/asr-cast-435c233.stdout
Original file line number Diff line number Diff line change
@@ -1 +1 @@
(TranslationUnit (SymbolTable 1 {_lpython_main_program: (Function (SymbolTable 96 {}) _lpython_main_program [f] [] [(SubroutineCall 1 f () [] ())] () Source Public Implementation () .false. .false. .false. .false. .false. [] [] .false.), f: (Function (SymbolTable 2 {list: (ExternalSymbol 2 list 4 list lpython_builtin [] list Private), s: (Variable 2 s Local () () Default (Character 1 -2 () []) Source Public Required .false.), x: (Variable 2 x Local () () Default (List (Character 1 -2 () [])) Source Public Required .false.), y: (Variable 2 y Local () () Default (List (Character 1 -2 () [])) Source Public Required .false.)}) f [list list list] [] [(= (Var 2 s) (StringConstant "lpython" (Character 1 7 () [])) ()) (= (Var 2 x) (FunctionCall 2 list () [((Var 2 s))] (List (Character 1 -2 () [])) () ()) ()) (= (Var 2 y) (ListConstant [(StringConstant "a" (Character 1 1 () [])) (StringConstant "b" (Character 1 1 () [])) (StringConstant "c" (Character 1 1 () []))] (List (Character 1 1 () []))) ()) (= (Var 2 x) (FunctionCall 2 list () [((Var 2 y))] (List (Character 1 -2 () [])) () ()) ()) (= (Var 2 x) (FunctionCall 2 list () [((StringConstant "lpython" (Character 1 7 () [])))] (List (Character 1 -2 () [])) (ListConstant [(StringConstant "l" (Character 1 1 () [])) (StringConstant "p" (Character 1 1 () [])) (StringConstant "y" (Character 1 1 () [])) (StringConstant "t" (Character 1 1 () [])) (StringConstant "h" (Character 1 1 () [])) (StringConstant "o" (Character 1 1 () [])) (StringConstant "n" (Character 1 1 () []))] (List (Character 1 1 () []))) ()) ())] () Source Public Implementation () .false. .false. .false. .false. .false. [] [] .false.), lpython_builtin: (IntrinsicModule lpython_builtin), main_program: (Program (SymbolTable 95 {}) main_program [] [(SubroutineCall 1 _lpython_main_program () [] ())])}) [])
(TranslationUnit (SymbolTable 1 {_lpython_main_program: (Function (SymbolTable 91 {}) _lpython_main_program [f] [] [(SubroutineCall 1 f () [] ())] () Source Public Implementation () .false. .false. .false. .false. .false. [] [] .false.), f: (Function (SymbolTable 2 {list: (ExternalSymbol 2 list 4 list lpython_builtin [] list Private), s: (Variable 2 s Local () () Default (Character 1 -2 () []) Source Public Required .false.), x: (Variable 2 x Local () () Default (List (Character 1 -2 () [])) Source Public Required .false.), y: (Variable 2 y Local () () Default (List (Character 1 -2 () [])) Source Public Required .false.)}) f [list list list] [] [(= (Var 2 s) (StringConstant "lpython" (Character 1 7 () [])) ()) (= (Var 2 x) (FunctionCall 2 list () [((Var 2 s))] (List (Character 1 -2 () [])) () ()) ()) (= (Var 2 y) (ListConstant [(StringConstant "a" (Character 1 1 () [])) (StringConstant "b" (Character 1 1 () [])) (StringConstant "c" (Character 1 1 () []))] (List (Character 1 1 () []))) ()) (= (Var 2 x) (FunctionCall 2 list () [((Var 2 y))] (List (Character 1 -2 () [])) () ()) ()) (= (Var 2 x) (FunctionCall 2 list () [((StringConstant "lpython" (Character 1 7 () [])))] (List (Character 1 -2 () [])) (ListConstant [(StringConstant "l" (Character 1 1 () [])) (StringConstant "p" (Character 1 1 () [])) (StringConstant "y" (Character 1 1 () [])) (StringConstant "t" (Character 1 1 () [])) (StringConstant "h" (Character 1 1 () [])) (StringConstant "o" (Character 1 1 () [])) (StringConstant "n" (Character 1 1 () []))] (List (Character 1 1 () []))) ()) ())] () Source Public Implementation () .false. .false. .false. .false. .false. [] [] .false.), lpython_builtin: (IntrinsicModule lpython_builtin), main_program: (Program (SymbolTable 90 {}) main_program [] [(SubroutineCall 1 _lpython_main_program () [] ())])}) [])
2 changes: 1 addition & 1 deletion tests/reference/asr-complex1-f26c460.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"outfile": null,
"outfile_hash": null,
"stdout": "asr-complex1-f26c460.stdout",
"stdout_hash": "642269d23c09ee2a6d59c471f28fd6f04ea9ed7c75f00fc8c0de6373",
"stdout_hash": "e1c665b190d6a124346f037fde8bb0fce04c8a6a1e43a2949e7e56eb",
"stderr": null,
"stderr_hash": null,
"returncode": 0
Expand Down
Loading