Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently when one attempts to use DataFrame.to_sql
to write a dataframe with more than 2GB of adta into an Oracle database, the load fails with the error sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) DPI-1015: array size of <n> is too large
. That is because the Oracle database driver does not support inserting more than 2GB of data at a time (see documentation for cx_oracle and for python-oracledb).
Feature Description
Currently we make one call to SQLAlchemy's execute
method (in _execute_insert
, in `pandas/io/sql.py'). Instead we want to do something like this, at least when using an Oracle driver:
- Find the largest data size of any row
- Set the batch size as floor(2GB / (result from (1)))
- Call
execute
repeatedly with the number of rows from (2).
Alternative Solutions
This could be addressed within SQLAlchemy but the developers have decided this issue is out of scope.
This could be addressed within the Oracle database driver but the bug report there has been open for several years now with no action.
Additional Context
A Google search for DPI-1015 shows many ways of dealing with this issue, including this stackoverflow post about working around the issue using Pandas.