Skip to content

ENH: Support loading more than 2 GB of data to Oracle database. #55548

Closed
@vectro

Description

@vectro

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently when one attempts to use DataFrame.to_sql to write a dataframe with more than 2GB of adta into an Oracle database, the load fails with the error sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) DPI-1015: array size of <n> is too large. That is because the Oracle database driver does not support inserting more than 2GB of data at a time (see documentation for cx_oracle and for python-oracledb).

Feature Description

Currently we make one call to SQLAlchemy's execute method (in _execute_insert, in `pandas/io/sql.py'). Instead we want to do something like this, at least when using an Oracle driver:

  1. Find the largest data size of any row
  2. Set the batch size as floor(2GB / (result from (1)))
  3. Call execute repeatedly with the number of rows from (2).

Alternative Solutions

This could be addressed within SQLAlchemy but the developers have decided this issue is out of scope.

This could be addressed within the Oracle database driver but the bug report there has been open for several years now with no action.

Additional Context

A Google search for DPI-1015 shows many ways of dealing with this issue, including this stackoverflow post about working around the issue using Pandas.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions