Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot insert large data sets in Oracle table using executemany() #548

Closed
megurikun08 opened this issue Mar 23, 2021 · 10 comments
Closed

Cannot insert large data sets in Oracle table using executemany() #548

megurikun08 opened this issue Mar 23, 2021 · 10 comments
Labels

Comments

@megurikun08
Copy link

megurikun08 commented Mar 23, 2021

Hi All,

This problem has been haunting me for quite sometime, and I need to resolve this asap. When I am trying to batch insert a data from a Data Frame, I get an error of array size of ... is too large. this is my current line for the executemany

x = [tuple(i) for i in df.values]
cursor.executemany(insert_sql, x)

I have tried some resolution present here in the internet but nothing works. I am quite aware that I can have the dataframe set to chunks but I cannot see any proper codes that work. As I have read the cx_Oracle documentation, the data size limit for executemany is 2GB. Here are my questions:

  1. Can we tweak the data size limit of 2GB into a larger data size limit?
  2. If 1 does not work, how can I properly slice my one large dataset into chunks and insert them chunk by chunk using executemany?

--EDIT--
Let me rephrase the question into:

how should i chunk my large datasets so i can insert them to my oracle database table using executemany?

Win10 - Python v3.7.9 - cx_Oracle v8.1 client ver 19.10
--END--

Thanks everyone!

@cjbj
Copy link
Member

cjbj commented Mar 23, 2021

Can you update your question with some details?

Start by answering the GitHub issues template questions.

What's the error - don't make us guess? What's the table description? Give some sample data.

Review the cx_Oracle documentation Batch Statement Execution and Bulk Loading. The final example (about CSV files) shows one way of looping over executemany() calls that may give you some ideas.

@cjbj cjbj added the question label Mar 23, 2021
@megurikun08
Copy link
Author

Hi cjbj! I have updated the question. Unfortunately I cannot provide any sample data due to privacy concerns. But my dataset consists of several columns with hundreds of thousands or rows (some are even in million).

The exact error is this: DPI-1015: array size of is too large

data is loaded as dataframe using pandas, then it should be loaded into an existing table. my target method is executemany to significantly decrease insert time but executemany just wont let me. using execute() works perfectly fine, its just that it takes too much time.

@rajendrapasupuleti
Copy link

Could someone help on this. I am facing same issue. In my current code we are using Pandas, numpy and cx_Oracle libraries to insert and update data to Oracle table. We process about 30000 to 40000 records on each day. I can't share my source code unfortunately.

@anthony-tuininga
Copy link
Member

The error you mentioned (DPI-1015: array size of is too large) indicates that you need to reduce the number of rows you are binding at one time. So instead of trying to send all hundreds of thousands (or millions) of rows at the same time, break them up into batches. In other words, something like this:

start_pos = 0
batch_size = 15000
all_data = get_all_data_to_process()
while start_pos < len(all_data):
    data = all_data[start_pos:start_pos + batch_size]
    start_pos += batch_size
    cursor.executemany(your_sql, data)

Choose a batch size that works well for you. Decide whether or not you wish to have batches committed as they reach the database successfully or if you wish to wait until the entire set of data has reached the database successfully.

This sort of behavior has been requested within cx_Oracle itself as #200 which may be of interest.

@megurikun08
Copy link
Author

thank you @anthony-tuininga for the wonderful code you shared. I can now insert millions of rows using executemany. Cheers!

@cjbj
Copy link
Member

cjbj commented Apr 13, 2021

@megurikun08 thanks for the feedback.

If you haven't already optimized the insert, don't forget to look at whether using things like an APPEND hint will help. And check the constraint and index usage. It would be interesting to know whether writing to a disk file and then using SQL*Loader with the Direct Path load also gives a performance boost - it would be data & network dependent. These are all out of scope for the cx_Oracle repo, but there are plenty of resources online that can help.

@vsanjairaj
Copy link

thank you @anthony-tuininga for the wonderful code you shared. I can now insert millions of rows using executemany. Cheers!

Can you please share an example code for the same ?

@cjbj
Copy link
Member

cjbj commented May 12, 2022

@vsanjairaj see the comments above for the code that was shared. Also read the doc Batch Statement Execution and Bulk Loading.

@avmcohen
Copy link

Question: Is there is pre-defined limitation for array size ? records counts or actual size ?

@anthony-tuininga
Copy link
Member

The limit is on actual size: the maximum size in bytes of one row of data multiplied by the number of rows must be less than 2 ** 31 -- for each bind variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants