Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128657: fix _hashopenssl ref/data race #128886

Merged
merged 11 commits into from
Feb 8, 2025
Merged

Conversation

tom-pytel
Copy link
Contributor

@tom-pytel tom-pytel commented Jan 15, 2025

Fix a possible data and PY_EVP_MD refcount race in _hashopenssl.c in py_digest_by_name() under free-threading.

@bedevere-app
Copy link

bedevere-app bot commented Jan 15, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@ZeroIntensity
Copy link
Member

The implementation LGTM. You can reuse the reproducer in the issue as a test. See the devguide for writing tests.

@tom-pytel
Copy link
Contributor Author

The implementation LGTM. You can reuse the reproducer in the issue as a test. See the devguide for writing tests.

Are TSAN tests possible? Because that is the only way the issue shows up, other than as an extra refcount at a specific byte offset of the opaque evp_md_st sturct IF two threads do this simultaneously (very non-deterministic).

@ZeroIntensity
Copy link
Member

Yeah, tests are run with TSan as part of CI. For example, on this PR: https://github.com/python/cpython/actions/runs/12809851053/job/35715616858?pr=128886

@tom-pytel
Copy link
Contributor Author

Yeah, tests are run with TSan as part of CI. For example, on this PR: https://github.com/python/cpython/actions/runs/12809851053/job/35715616858?pr=128886

Still not sure how you want me to implement this test. python -m test --tsan appears to exist with the intent to test the thread sanitizer itself and I can't find any actual test that interacts with tsan to get positive or negative results? Can you point me to one, because I have a test written but no way to know if it succeeds or fails according to tsan.

@ZeroIntensity
Copy link
Member

You don't need to do anything special--just write the test. Python will be built with TSan integration, and races on that test will show up in CI.

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Jan 16, 2025

You don't need to do anything special--just write the test. Python will be built with TSan integration, and races on that test will show up in CI.

Done. Had to add test_hashlib to list of tsan tests if you want it to be executed in that context. Keep in mind is not 100% deterministic to find problem if fix is not present, would have to be even slower for that.

@picnixz
Copy link
Member

picnixz commented Jan 17, 2025

I don't have much time today and tomorrow for in-depth rewview and there is some other stuff to do before reviewing this one, but I'll do it on Sunday or Monday. If I forget, just ping me back (or you can DM me on Discord @ZeroIntensity)

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review I forgot to send. I'll have more time starting on Monday to do the in-depth review.

@tom-pytel
Copy link
Contributor Author

Sent up requested changes, but I want to be clear, with or without the tsan fix, this test doesn't actually do anything useful (I have no tsan output or test failure when fix is removed).

$ ./python -m test --tsan test_hashlib -v
== CPython 3.14.0a4+ experimental free-threading build (heads/array-10-g40a4d88a14-dirty:40a4d88a14, Jan 16 2025, 11:58:58) [GCC 11.4.0]
== Linux-6.8.0-51-generic-x86_64-with-glibc2.35 little-endian
== Python build: free_threading debug TSAN
...
test_py_digest_by_name_data_race (test.test_hashlib.KDFTests.test_py_digest_by_name_data_race) ... ok
...
Result: SUCCESS

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tom-pytel - this change looks good to me.

If the test doesn't do anything useful, would you please remove it? We can catch this race by running the existing haslib tests with --parallel-threads=4. We'll need to make some minor adjustments to test_hashlib.py, but we can do that after this fix is merged.

colesbury added a commit to colesbury/cpython that referenced this pull request Feb 7, 2025
This catches the race in `py_digest_by_name` that is fixed separately
in pythongh-128886.
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@colesbury
Copy link
Contributor

I'll merge this next week if nobody else has further feedback or merges it before me.

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little tricky but seems solid.

@gpshead gpshead merged commit 6c67904 into python:main Feb 8, 2025
46 checks passed
@miss-islington-app
Copy link

Thanks @tom-pytel for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @tom-pytel and @gpshead, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 6c67904e793828d84716a8c83436c9495235f3a1 3.13

@gpshead
Copy link
Member

gpshead commented Feb 8, 2025

@colesbury can you handle the backport?

@ZeroIntensity
Copy link
Member

I think it would also be great to give @tom-pytel a chance to backport if he's up for it.

tom-pytel added a commit to tom-pytel/cpython that referenced this pull request Feb 8, 2025
@bedevere-app
Copy link

bedevere-app bot commented Feb 8, 2025

GH-129852 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Feb 8, 2025
@tom-pytel
Copy link
Contributor Author

tom-pytel commented Feb 8, 2025

GH-129852 is a backport of this pull request to the 3.13 branch.

cherry_picker was giving me an error from git so I did this manually from the dry run, feel free to beat with stupid stick if I messes something up. Otherwise let me know and will do same for 3.12.

@bedevere-app
Copy link

bedevere-app bot commented Feb 8, 2025

GH-129853 is a backport of this pull request to the 3.13 branch.

@ZeroIntensity
Copy link
Member

Yeah, you're supposed to get an error for git. That error is telling you to fix the merge conflicts 😅

No need for 3.12, free-threading doesn't exist there.

@tom-pytel
Copy link
Contributor Author

Yeah, you're supposed to get an error for git. That error is telling you to fix the merge conflicts 😅

No need for 3.12, free-threading doesn't exist there.

No, it wasn't that, it was something weird git-related, am gonna post this as an issue unless you have an idea:

(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {main} $ cherry_picker 6c67904e793828d84716a8c83436c9495235f3a1 3.13
🐍 🍒 ⛏
Now backporting '6c67904e793828d84716a8c83436c9495235f3a1' into '3.13'
Error cherry-pick 6c67904e793828d84716a8c83436c9495235f3a1.
Auto-merging Modules/_hashopenssl.c
CONFLICT (content): Merge conflict in Modules/_hashopenssl.c
error: could not apply 6c67904e793... gh-128657: fix _hashopenssl ref/data race (GH-128886)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".


Failed to cherry-pick 6c67904e793828d84716a8c83436c9495235f3a1 into 3.13 ☹
... Stopping here.

To continue and resolve the conflict:
    $ cherry_picker --status  # to find out which files need attention
    # Fix the conflict
    $ cherry_picker --status  # should now say 'all conflict fixed'
    $ cherry_picker --continue

To abort the cherry-pick and cleanup:
    $ cherry_picker --abort


Fix...


(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ cherry_picker --status
🐍 🍒 ⛏
On branch backport-6c67904-3.13
You are currently cherry-picking commit 6c67904e793.
  (all conflicts fixed: run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   Misc/NEWS.d/next/Library/2025-01-15-15-45-21.gh-issue-128657.P5LNQA.rst
	modified:   Modules/_hashopenssl.c


(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ cherry_picker --continue
🐍 🍒 ⛏
Traceback (most recent call last):
  File "/home/tom/work/cpython/128657/local/venv/bin/cherry_picker", line 8, in <module>
    sys.exit(cherry_pick_cli())
             ~~~~~~~~~~~~~~~^^
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/cherry_picker/cherry_picker.py", line 855, in cherry_pick_cli
    cherry_picker.continue_cherry_pick()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/cherry_picker/cherry_picker.py", line 639, in continue_cherry_pick
    commits = get_commits_from_backport_branch(base)
  File "/home/tom/work/cpython/128657/local/venv/lib/python3.14/site-packages/cherry_picker/cherry_picker.py", line 965, in get_commits_from_backport_branch
    output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/home/tom/work/cpython/128657/local/lib/python3.14/subprocess.py", line 474, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               **kwargs).stdout
               ^^^^^^^^^
  File "/home/tom/work/cpython/128657/local/lib/python3.14/subprocess.py", line 579, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'log', '--format=%H', '3.13..']' returned non-zero exit status 128.

(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ git log --format=%H 3.13..
fatal: ambiguous argument '3.13..': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ git --version
git version 2.34.1

(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ cherry_picker status
🐍 🍒 ⛏
Run state cherry-picker.state=CONTINUATION_STARTED in Git config is not known.
Perhaps it has been set by a newer version of cherry-picker. Try upgrading.
Valid states are: BACKPORT_PAUSED, UNSET. If this looks suspicious, raise an issue at https://github.com/python/cherry-picker/issues/new.
As the last resort you can reset the runtime state stored in Git config using the following command: `git config --local --remove-section cherry-picker`

@ZeroIntensity
Copy link
Member

IIRC, that shows up if you don't have a local 3.13 branch from upstream.

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Feb 8, 2025

IIRC, that shows up if you don't have a local 3.13 branch from upstream.

But cherry picker created the branch, and:

(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ git remote -v
origin	[email protected]:tom-pytel/cpython.git (fetch)
origin	[email protected]:tom-pytel/cpython.git (push)
upstream	[email protected]:python/cpython.git (fetch)
upstream	[email protected]:python/cpython.git (push)

Or did I have to specify upstream on the command line?

cherry_picker 6c67904e793828d84716a8c83436c9495235f3a1 3.13

It was a branch from upstream/3.13:

(venv) tom@tom-VirtualBox:~/work/cpython/128657/cp {backport-6c67904-3.13} $ git log --all --decorate --oneline --graph
* 610aa38e9bf (HEAD -> backport-6c67904-3.13) [3.13] gh-128657: fix _hashopenssl ref/data race (GH-128886) (cherry picked from commit 6c67904e793828d84716a8c83436c9495235f3a1)
* 8a7146c5eb3 (upstream/3.13) [3.13] gh-117657: Fix data race in `dict_dict_merge` (gh-129755) (gh-129808)
* aae0a1f9044 [3.13] Add multinomial to the itertools recipes docs (gh-129760) (gh-129762)
* f7af8bc58aa  [3.13] gh-129533: Update PyGC_Enable/Disable/IsEnabled to use atomic operat… (gh-129756)
* f7cc8623457 [3.13] gh-129732: Fix race on `shared->array` in qsbr code under free-threading (gh-129738) (gh-129747)

NM, I understand what u mean...

@tom-pytel
Copy link
Contributor Author

Correct backport is here #129853

colesbury pushed a commit that referenced this pull request Feb 8, 2025
gpshead pushed a commit that referenced this pull request Feb 8, 2025
* gh-128657: Run test_hashlib with `--parallel-threads`

This catches the race in `py_digest_by_name` that is fixed separately
in gh-128886.

* Adjust assertion order
@tom-pytel tom-pytel deleted the fix-issue-128657 branch March 4, 2025 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants