You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when one attempts to use DataFrame.to_sql to write a dataframe with more than 2GB of adta into an Oracle database, the load fails with the error sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) DPI-1015: array size of <n> is too large. That is because the Oracle database driver does not support inserting more than 2GB of data at a time (see documentation for cx_oracle and for python-oracledb).
Feature Description
Currently we make one call to SQLAlchemy's execute method (in _execute_insert, in `pandas/io/sql.py'). Instead we want to do something like this, at least when using an Oracle driver:
Find the largest data size of any row
Set the batch size as floor(2GB / (result from (1)))
Call execute repeatedly with the number of rows from (2).
This could be addressed within the Oracle database driver but the bug report there has been open for several years now with no action.
Additional Context
A Google search for DPI-1015 shows many ways of dealing with this issue, including this stackoverflow post about working around the issue using Pandas.
The text was updated successfully, but these errors were encountered:
Thanks for the request, but pandas does not have any db flavor specific logic and needs to avoid any type of data introspection to be as general purpose as possible. Given this is out of scope for SQLAlchemy this is out of scope for pandas as well.
Additionally, there is the chunksize argument to help load large data size.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
Currently when one attempts to use
DataFrame.to_sql
to write a dataframe with more than 2GB of adta into an Oracle database, the load fails with the errorsqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) DPI-1015: array size of <n> is too large
. That is because the Oracle database driver does not support inserting more than 2GB of data at a time (see documentation for cx_oracle and for python-oracledb).Feature Description
Currently we make one call to SQLAlchemy's
execute
method (in_execute_insert
, in `pandas/io/sql.py'). Instead we want to do something like this, at least when using an Oracle driver:execute
repeatedly with the number of rows from (2).Alternative Solutions
This could be addressed within SQLAlchemy but the developers have decided this issue is out of scope.
This could be addressed within the Oracle database driver but the bug report there has been open for several years now with no action.
Additional Context
A Google search for DPI-1015 shows many ways of dealing with this issue, including this stackoverflow post about working around the issue using Pandas.
The text was updated successfully, but these errors were encountered: