Skip to content

Microsoft Dev Blog: "mssql-python vs pyodbc: Benchmarking SQL Server Performance" #554

@benmatwil

Description

@benmatwil

Discovered the mssql-python library and the microsoft dev blog comparing the performance of the library vs pyodbc here: https://devblogs.microsoft.com/python/mssql-python-vs-pyodbc-benchmarking-sql-server-performance/

So I started to look into and test replacing our current pyodbc-based implementation with mssql-python by switching from pyodbc/sqlalchemy with pandas.read_sql to the .arrow().to_pandas() functionality of the mssql-python library.
Using the arrow integration, I found some significant speed ups in our data getters. This is significant for some of our work flows which have been slow in the past using our current sql to python workflow (e.g. 15s vs 42s).

In testing, I have looked into all the various workflows to compare the two libraries:

  1. I have used our existing DBs and queries to test timings for
  • .fetchall(): pyodbc vs mssql_python
  • pyodbc vs mssql_python via sqlalchemy/pandas (the latest beta of sqlalchemy for mssql-python compatibility)
  • mssql-python via .arrow()
  1. Used the performance test script from the original blog article here: https://github.com/microsoft/mssql-python/blob/main/benchmarks/bench_mssql.py

For these tests I initially used our existing infrastructure with SQL Server 2019 and Python 3.10. However to test whether they played any factor, I also tried Python 3.13 and SQL Server 2025 (via a fresh docker container)

Going via pyarrow is always the fastest which seems to be no surprise and provides generally the best features like fully typed columns and future compatibility with other libraries which is perfect.

However in testing the other scenarios which are more of a fair test between pyodbc and mssql-python and comparing their equivalent functionality. I keep finding that pyodbc is either approximately as performant or quicker than mssql-pyodbc.

Here's a list of results using the tests from the article:

Operation                   pyodbc min  mssql min  Ratio(min)   pyodbc med  mssql med  Ratio(med)
----------------------------------------------------------------------------------------------------
SELECT                         57.83ms    58.22ms      0.99x (pyodbc)      65.94ms    67.01ms      0.98x (pyodbc)
INSERT                         14.43ms    16.22ms      0.89x (pyodbc)      17.05ms    18.86ms      0.90x (pyodbc)
UPDATE                         14.24ms    17.37ms      0.82x (pyodbc)      16.66ms    20.57ms      0.81x (pyodbc)
DELETE                          6.04ms     7.98ms      0.76x (pyodbc)       7.72ms    10.64ms      0.73x (pyodbc)
Complex query                   5.79ms     7.80ms      0.74x (pyodbc)       7.05ms     9.98ms      0.71x (pyodbc)
Multiple connections           67.30ms    86.74ms      0.78x (pyodbc)      74.46ms    98.73ms      0.75x (pyodbc)
Fetch one                       5.86ms     7.55ms      0.78x (pyodbc)       8.07ms     9.75ms      0.83x (pyodbc)
Fetch many                      5.78ms     7.81ms      0.74x (pyodbc)       6.78ms     9.62ms      0.70x (pyodbc)
Stored procedure                5.80ms     7.53ms      0.77x (pyodbc)       6.79ms     8.75ms      0.78x (pyodbc)
Nested query                    5.93ms     7.74ms      0.77x (pyodbc)       7.30ms     9.76ms      0.75x (pyodbc)
Large data set                  5.76ms     8.21ms      0.70x (pyodbc)       6.78ms     9.70ms      0.70x (pyodbc)
Join query                      6.13ms     8.50ms      0.72x (pyodbc)       7.61ms    10.19ms      0.75x (pyodbc)
Execute many                   22.56ms    21.60ms      1.04x (mssql)      27.91ms    26.57ms      1.05x (mssql)
100 inserts                   187.36ms    23.13ms      8.10x (mssql)     204.97ms    25.67ms      7.98x (mssql)
Transaction                    22.62ms    24.63ms      0.92x (pyodbc)      28.83ms    30.79ms      0.94x (pyodbc)
Update with join                5.83ms     7.52ms      0.77x (pyodbc)       7.18ms     9.15ms      0.78x (pyodbc)
Delete with join                5.45ms     7.80ms      0.70x (pyodbc)       6.54ms     9.10ms      0.72x (pyodbc)

And an example benchmark for one of our internal queries of 24 columns by:

  • 1709738 rows
    pyodbc fetchall        23.5540s
    mssql-python fetchall  37.0886s
    SQLAlchemy+pyodbc      41.9620s
    SQLAlchemy+mssql-py    53.1163s
    
  • 77712 rows
    pyodbc fetchall        1.0679s
    mssql-python fetchall  1.2697s
    SQLAlchemy+pyodbc      1.7786s
    SQLAlchemy+mssql-py    2.0846s
    

I am by no means a SQL expert so maybe there's something else which is going on which means mssql-python is not reaching it's peak speeds.

Questions:

  • Any ideas why I cannot not replicate something similar to the original benchmarks whereby there's at least a 2x speed up in most situations?
  • If there is some sort of performance regression here, does this mean the arrow integration could be even faster than it is now in this initial testing?

Spec Details:
Python version: 3.10.11 and 3.13.12
SQL Server Version: 2019 (existing) and 2025 (test docker container)
OS: Windows 10
python-mssql: 1.6.0
pyodbc: 5.3.0
sqlalchemy: 2.1.0b1

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestinADOtriage doneIssues that are triaged by dev team and are in investigation.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions