Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support loading more than 2 GB of data to Oracle database. #55548

Closed
1 of 3 tasks
vectro opened this issue Oct 16, 2023 · 2 comments
Closed
1 of 3 tasks

ENH: Support loading more than 2 GB of data to Oracle database. #55548

vectro opened this issue Oct 16, 2023 · 2 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement

Comments

@vectro
Copy link

vectro commented Oct 16, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently when one attempts to use DataFrame.to_sql to write a dataframe with more than 2GB of adta into an Oracle database, the load fails with the error sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) DPI-1015: array size of <n> is too large. That is because the Oracle database driver does not support inserting more than 2GB of data at a time (see documentation for cx_oracle and for python-oracledb).

Feature Description

Currently we make one call to SQLAlchemy's execute method (in _execute_insert, in `pandas/io/sql.py'). Instead we want to do something like this, at least when using an Oracle driver:

  1. Find the largest data size of any row
  2. Set the batch size as floor(2GB / (result from (1)))
  3. Call execute repeatedly with the number of rows from (2).

Alternative Solutions

This could be addressed within SQLAlchemy but the developers have decided this issue is out of scope.

This could be addressed within the Oracle database driver but the bug report there has been open for several years now with no action.

Additional Context

A Google search for DPI-1015 shows many ways of dealing with this issue, including this stackoverflow post about working around the issue using Pandas.

@vectro vectro added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 16, 2023
@mroeschke
Copy link
Member

Thanks for the request, but pandas does not have any db flavor specific logic and needs to avoid any type of data introspection to be as general purpose as possible. Given this is out of scope for SQLAlchemy this is out of scope for pandas as well.

Additionally, there is the chunksize argument to help load large data size.

@mroeschke mroeschke added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 16, 2023
@mroeschke
Copy link
Member

Closing as this appears out of scope for pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants