Skip to content

Commit ec28b97

Browse files
authored
feat: High performance pandas integration. (#24)
1 parent d9fa199 commit ec28b97

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+11360
-399
lines changed

.gitignore

+11-1
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,22 @@
1-
src/questdb/ingress.html
21
src/questdb/ingress.c
2+
src/questdb/*.html
33
rustup-init.exe
44

5+
# Linux Perf profiles
6+
perf.data*
7+
perf/*.svg
8+
9+
# Atheris Crash/OOM and other files
10+
fuzz-artifact/
11+
512
# Byte-compiled / optimized / DLL files
613
__pycache__/
714
*.py[cod]
815
*$py.class
916

17+
# Parquet files generated as part of example runs
18+
*.parquet
19+
1020
# C extensions
1121
*.so
1222

.vscode/settings.json

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
{
2-
"esbonio.sphinx.confDir": ""
2+
"esbonio.sphinx.confDir": "",
3+
"cmake.configureOnOpen": false,
4+
"files.associations": {
5+
"ingress_helper.h": "c"
6+
}
37
}

CHANGELOG.rst

+57-3
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,47 @@
22
Changelog
33
=========
44

5+
1.1.0 (2023-01-04)
6+
------------------
7+
8+
Features
9+
~~~~~~~~
10+
11+
* High-performance ingestion of `Pandas <https://pandas.pydata.org/>`_
12+
dataframes into QuestDB via ILP.
13+
We now support most Pandas column types. The logic is implemented in native
14+
code and is orders of magnitude faster than iterating the dataframe
15+
in Python and calling the ``Buffer.row()`` or ``Sender.row()`` methods: The
16+
``Buffer`` can be written from Pandas at hundreds of MiB/s per CPU core.
17+
The new ``dataframe()`` method continues working with the ``auto_flush``
18+
feature.
19+
See API documentation and examples for the new ``dataframe()`` method
20+
available on both the ``Sender`` and ``Buffer`` classes.
21+
22+
* New ``TimestampNanos.now()`` and ``TimestampMicros.now()`` methods.
23+
*These are the new recommended way of getting the current timestamp.*
24+
25+
* The Python GIL is now released during calls to ``Sender.flush()`` and when
26+
``auto_flush`` is triggered. This should improve throughput when using the
27+
``Sender`` from multiple threads.
28+
29+
Errata
30+
~~~~~~
31+
32+
* In previous releases the documentation for the ``from_datetime()`` methods of
33+
the ``TimestampNanos`` and ``TimestampMicros`` types recommended calling
34+
``datetime.datetime.utcnow()`` to get the current timestamp. This is incorrect
35+
as it will (confusinly) return object with the local timezone instead of UTC.
36+
This documentation has been corrected and now recommends calling
37+
``datetime.datetime.now(tz=datetime.timezone.utc)`` or (more efficiently) the
38+
new ``TimestampNanos.now()`` and ``TimestampMicros.now()`` methods.
39+
540
1.0.2 (2022-10-31)
641
------------------
742

43+
Features
44+
~~~~~~~~
45+
846
* Support for Python 3.11.
947
* Updated to version 2.1.1 of the ``c-questdb-client`` library:
1048

@@ -14,20 +52,30 @@ Changelog
1452
1.0.1 (2022-08-16)
1553
------------------
1654

55+
Features
56+
~~~~~~~~
57+
58+
* As a matter of convenience, the ``Buffer.row`` method can now take ``None`` column
59+
values. This has the same semantics as skipping the column altogether.
60+
Closes `#3 <https://github.com/questdb/py-questdb-client/issues/3>`_.
61+
62+
Bugfixes
63+
~~~~~~~~
64+
1765
* Fixed a major bug where Python ``int`` and ``float`` types were handled with
1866
32-bit instead of 64-bit precision. This caused certain ``int`` values to be
1967
rejected and other ``float`` values to be rounded incorrectly.
2068
Closes `#13 <https://github.com/questdb/py-questdb-client/issues/13>`_.
21-
* As a matter of convenience, the ``Buffer.row`` method can now take ``None`` column
22-
values. This has the same semantics as skipping the column altogether.
23-
Closes `#3 <https://github.com/questdb/py-questdb-client/issues/3>`_.
2469
* Fixed a minor bug where an error auto-flush caused a second clean-up error.
2570
Closes `#4 <https://github.com/questdb/py-questdb-client/issues/4>`_.
2671

2772

2873
1.0.0 (2022-07-15)
2974
------------------
3075

76+
Features
77+
~~~~~~~~
78+
3179
* First stable release.
3280
* Insert data into QuestDB via ILP.
3381
* Sender and Buffer APIs.
@@ -38,6 +86,9 @@ Changelog
3886
0.0.3 (2022-07-14)
3987
------------------
4088

89+
Features
90+
~~~~~~~~
91+
4192
* Initial set of features to connect to the database.
4293
* ``Buffer`` and ``Sender`` classes.
4394
* First release where ``pip install questdb`` should work.
@@ -46,4 +97,7 @@ Changelog
4697
0.0.1 (2022-07-08)
4798
------------------
4899

100+
Features
101+
~~~~~~~~
102+
49103
* First release on PyPI.

README.rst

+16
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,22 @@ The latest version of the library is 1.0.2.
3434
columns={'temperature': 20.0, 'humidity': 0.5})
3535
sender.flush()
3636
37+
You can also send Pandas dataframes:
38+
39+
.. code-block:: python
40+
41+
import pandas as pd
42+
from questdb.ingress import Sender
43+
44+
df = pd.DataFrame({
45+
'id': pd.Categorical(['toronto1', 'paris3']),
46+
'temperature': [20.0, 21.0],
47+
'humidity': [0.5, 0.6],
48+
'timestamp': pd.to_datetime(['2021-01-01', '2021-01-02'])'})
49+
50+
with Sender('localhost', 9009) as sender:
51+
sender.dataframe(df, table_name='sensors')
52+
3753
3854
Docs
3955
====

TODO.rst

-12
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ TODO
66
Build Tooling
77
=============
88

9-
* **[HIGH]** Transition to Azure, move Linux arm to ARM pipeline without QEMU.
10-
119
* **[MEDIUM]** Automate Apple Silicon as part of CI.
1210

1311
* **[LOW]** Release to PyPI from CI.
@@ -19,13 +17,3 @@ Docs
1917
* **[MEDIUM]** Examples should be tested as part of the unit tests (as they
2018
are in the C client). This is to ensure they don't "bit rot" as the code
2119
changes.
22-
23-
* **[MEDIUM]** Document on a per-version basis.
24-
25-
Development
26-
===========
27-
28-
* **[HIGH]** Implement ``tabular()`` API in the buffer.
29-
30-
* **[MEDIUM]** Implement ``pandas()`` API in the buffer.
31-
*This can probably wait for a future release.*

ci/cibuildwheel.yaml

+8-8
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ stages:
6868
- bash: |
6969
set -o errexit
7070
python3 -m pip install --upgrade pip
71-
pip3 install cibuildwheel==2.11.1
71+
python3 -m pip install cibuildwheel==2.11.2
7272
displayName: Install dependencies
7373
- bash: cibuildwheel --output-dir wheelhouse .
7474
displayName: Build wheels
@@ -83,7 +83,7 @@ stages:
8383
- bash: |
8484
set -o errexit
8585
python3 -m pip install --upgrade pip
86-
pip3 install cibuildwheel==2.11.1
86+
python3 -m pip install cibuildwheel==2.11.2
8787
displayName: Install dependencies
8888
- bash: cibuildwheel --output-dir wheelhouse .
8989
displayName: Build wheels
@@ -100,7 +100,7 @@ stages:
100100
- bash: |
101101
set -o errexit
102102
python3 -m pip install --upgrade pip
103-
pip3 install cibuildwheel==2.11.1
103+
python3 -m pip install cibuildwheel==2.11.2
104104
displayName: Install dependencies
105105
- bash: cibuildwheel --output-dir wheelhouse .
106106
displayName: Build wheels
@@ -117,7 +117,7 @@ stages:
117117
- bash: |
118118
set -o errexit
119119
python3 -m pip install --upgrade pip
120-
pip3 install cibuildwheel==2.11.1
120+
python3 -m pip install cibuildwheel==2.11.2
121121
displayName: Install dependencies
122122
- bash: cibuildwheel --output-dir wheelhouse .
123123
displayName: Build wheels
@@ -134,7 +134,7 @@ stages:
134134
- bash: |
135135
set -o errexit
136136
python3 -m pip install --upgrade pip
137-
pip3 install cibuildwheel==2.11.1
137+
python3 -m pip install cibuildwheel==2.11.2
138138
displayName: Install dependencies
139139
- bash: cibuildwheel --output-dir wheelhouse .
140140
displayName: Build wheels
@@ -151,7 +151,7 @@ stages:
151151
- bash: |
152152
set -o errexit
153153
python3 -m pip install --upgrade pip
154-
python3 -m pip install cibuildwheel==2.11.1
154+
python3 -m pip install cibuildwheel==2.11.2
155155
displayName: Install dependencies
156156
- bash: cibuildwheel --output-dir wheelhouse .
157157
displayName: Build wheels
@@ -165,8 +165,8 @@ stages:
165165
- task: UsePythonVersion@0
166166
- bash: |
167167
set -o errexit
168-
python -m pip install --upgrade pip
169-
pip install cibuildwheel==2.11.1
168+
python3 -m pip install --upgrade pip
169+
python3 -m pip install cibuildwheel==2.11.2
170170
displayName: Install dependencies
171171
- bash: cibuildwheel --output-dir wheelhouse .
172172
displayName: Build wheels

ci/pip_install_deps.py

+74
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
import sys
2+
import subprocess
3+
import shlex
4+
import textwrap
5+
import platform
6+
7+
8+
class UnsupportedDependency(Exception):
9+
pass
10+
11+
12+
def pip_install(package):
13+
args = [
14+
sys.executable,
15+
'-m', 'pip', 'install',
16+
'--upgrade',
17+
'--only-binary', ':all:',
18+
package]
19+
args_s = ' '.join(shlex.quote(arg) for arg in args)
20+
sys.stderr.write(args_s + '\n')
21+
res = subprocess.run(
22+
args,
23+
stderr=subprocess.STDOUT,
24+
stdout=subprocess.PIPE)
25+
if res.returncode == 0:
26+
return
27+
output = res.stdout.decode('utf-8')
28+
if 'Could not find a version that satisfies the requirement' in output:
29+
raise UnsupportedDependency(output)
30+
else:
31+
sys.stderr.write(output + '\n')
32+
sys.exit(res.returncode)
33+
34+
35+
def try_pip_install(package):
36+
try:
37+
pip_install(package)
38+
except UnsupportedDependency as e:
39+
msg = textwrap.indent(str(e), ' ' * 8)
40+
sys.stderr.write(f' Ignored unsatisfiable dependency:\n{msg}\n')
41+
42+
43+
def ensure_timezone():
44+
try:
45+
import zoneinfo
46+
if platform.system() == 'Windows':
47+
pip_install('tzdata') # for zoneinfo
48+
except ImportError:
49+
pip_install('pytz')
50+
51+
52+
def main():
53+
ensure_timezone()
54+
try_pip_install('fastparquet>=2022.12.0')
55+
try_pip_install('pandas')
56+
try_pip_install('numpy')
57+
try_pip_install('pyarrow')
58+
59+
on_linux_is_glibc = (
60+
(not platform.system() == 'Linux') or
61+
(platform.libc_ver()[0] == 'glibc'))
62+
is_64bits = sys.maxsize > 2**32
63+
is_cpython = platform.python_implementation() == 'CPython'
64+
if on_linux_is_glibc and is_64bits and is_cpython:
65+
# Ensure that we've managed to install the expected dependencies.
66+
import pandas
67+
import numpy
68+
import pyarrow
69+
if sys.version_info >= (3, 8):
70+
import fastparquet
71+
72+
73+
if __name__ == "__main__":
74+
main()

ci/run_tests_pipeline.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,9 @@ stages:
2828
submodules: true
2929
- task: UsePythonVersion@0
3030
- script: python3 --version
31-
- script: python3 -m pip install cython
31+
- script: |
32+
python3 -m pip install cython
33+
python3 ci/pip_install_deps.py
3234
displayName: Installing Python dependencies
3335
- script: python3 proj.py build
3436
displayName: "Build"

dev_requirements.txt

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
setuptools>=45.2.0
22
Cython>=0.29.32
33
wheel>=0.34.2
4-
cibuildwheel>=2.11.1
4+
cibuildwheel>=2.11.2
55
Sphinx>=5.0.2
66
sphinx-rtd-theme>=1.0.0
77
twine>=4.0.1
88
bump2version>=1.0.1
9+
pandas>=1.3.5
10+
numpy>=1.21.6
11+
pyarrow>=10.0.1
12+
fastparquet>=2022.12.0

0 commit comments

Comments
 (0)