-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k

Description
Code Sample, a copy-pastable example if possible
# Your code here
import pandas as pd
import numpy as np
import gc
COLUMNS = list('abcde')
df_list = []
for i in range(100):
df = pd.DataFrame(np.random.rand(1_000_000, 5), columns=COLUMNS)
df = df[COLUMNS] # <-- LINE A SEE BELOW
df = df[df.a > 0.5] # <-- LINE B SEE BELOW
df_list.append(df)
df_all = pd.concat(df_list, axis=0)
del df
del df_list
del df_all
gc.collect()
Problem description
When running the code above there is memory leakage in Pandas 0.22.0
-
when running python process memory goes from:
60M --> 8.2G --> 300M -
When commenting out LINE A & LINE B process memory goes from:
60M --> 8.2G --> 60M
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.5.0
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None