-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of generators instead of list/tuples in executemany #200
Comments
That's an interesting request and it seems perfectly reasonable to request, too. :-) I'll consider this for a future release. |
Hi Anthony, |
@anthony-tuininga I guess you are thinking of the internal batching & inserting of data that was once discussed? Otherwise cx_Oracle would still need to instantiate the full data set in memory before calling OCI. @meisn you would have seen this in Anthony's blog post:
To work with the current functionality, try calling @anthony-tuininga I wonder whether direct path loading would be more interesting for large data sets? |
@cjbj, yes, I was thinking of batching the inserts and giving the caller the chance to specify how many rows would be processed in each batch, with a reasonable default. I do hope to take a look at direct path loading as well -- but in both cases batching will be needed for very large data sets. |
@cjbj Thanks for the link and the hint. In fact you recommended there to open the "issue" here :-) |
@meisn, what I think @cjbj is suggesting is that you would create your own version of executemany() in Python code which would accept the iterator. That method would then consume the iterator up to a certain number of rows (100? 1000? 10000?) and then call the real executemany() and continue doing this until the iterator was exhausted. This would be fairly easy to do in Python code, I believe. |
@anthony-tuininga I can try that. I just realized that the majority of this issue is my code. I create a list of dictionaries (to have my named parameters and to accommodate for the different types of xml-tags I can have in the files). This works fine up to a certain size of the file but is terrible for the larger ones. |
No problem. And I will certainly consider this enhancement request further when I get some time! |
@anthony-tuininga I have worked around my issue for now by putting up smaller inputfiles and hence smaller parameter sets. I use dictionaries as parameters, which turned out had probably not been my best choice. But with smaller params and more executemany-calls it works fine. |
Answer the following questions:
What is your version of Python? Is it 32-bit or 64-bit?
Python 3.6.4/3.7 64bit
What is your version of cx_Oracle?
6.4
What is your version of the Oracle client (e.g. Instant Client)? How was it
installed? Where is it installed?
11.2, packaged installation in my company
What is your version of the Oracle Database?
11gR2
What is your OS and version?
Windows 7 64bit Enterprise
What compiler version did you use? For example, with GCC, run
none
What environment variables did you set? How exactly did you set them?
What exact command caused the problem (e.g. what command did you try to
install with)? Who were you logged in as?
nothing actually, but intended to use executemany to (bulk)insert large dataset which has to be retrieved from a file before (i.e. parameter list generated). Logged in as a regular user, no sys or any privileged account, personal user has write privileges on the database tables of course
What error(s) you are seeing?
I have to insert many records from large (xml) files. I use a function for this, where I also have a generator version. On small files (python process below 300MB) I can generate the list of parameters and use executemany(stmnt, params) on it without any issue. Now I also have some larger files of the same sort, the files being around 90MB which results in memory consumption way above that (other modules involved to produce the parameter set). If I could use my generator instead of the function, i.e. if the whole list would not need to be produced upfront, the memory consumption would be way lower and my process would not fail with these file. I played a bit (testing) with the SQLITE3 module included, where there the generator is accepted in the executemany version of the method (https://docs.python.org/3/library/sqlite3.html) - and compared the two (there my python process ran close to a 1GB of memory with the function but didn't fail) and the generator version consumed some 250MB only, ran a bit longer of course, but that is acceptable when the insert itself matters.
Can there be an enhancement like this in one of the future versions?
The text was updated successfully, but these errors were encountered: