Skip to content

Conversation

@mweinelt
Copy link

@mweinelt mweinelt commented Jun 8, 2025

Work to bring wpull to Python 3.9 and adopt the PEP517 build process.

Updated some dependencies, as they were broken due to breaking changes in Python or just wouldn't lock otherwise.

Note

For 3.10 support an update to tornado>=6.0 would be required.

.venv/lib/python3.10/site-packages/tornado/httputil.py:106: in <module>
class HTTPHeaders(collections.MutableMapping):
E   AttributeError: module 'collections' has no attribute 'MutableMapping'

I opted to introduce uv as a modern development environment and packaging tool.

# create virtualenv
uv venv

# attach to virtualenv
source .venv/bin/activate

# install dependencies per uv.lock
uv sync

# run commands in the environment
wpull
pytest

# detach virtualenv
deactivate

Tests

Currently, we are running 575 tests and a low number of them are failing and might need investigation.

18 failed, 543 passed, 11 skipped, 3 xfailed, 1252 warnings in 28.22s

Test summary
FAILED wpull/document/html_test.py::TestHTML5LibHTML::test_html_encoding - IndexError: tuple index out of range
FAILED wpull/document/sitemap_test.py::TestHTML5LibSitemap::test_sitemap_encoding - IndexError: tuple index out of range
FAILED wpull/driver/phantomjs_test.py::TestPhantomJS::test_driver - FileNotFoundError: [Errno 2] No such file or directory: 'phantomjs'
FAILED wpull/proxy/proxy_test.py::TestProxySSL::test_basic_requests - wpull.errors.NetworkError: Proxy does not support CONNECT: 501 CONNECT is intentionally not supported
FAILED wpull/scraper/html_test.py::TestLxmlHTMLScraper::test_html_wrong_charset - AssertionError: {'http://example.com/utm/__utm.js', 'http[269 chars]gif'} != frozenset()
FAILED wpull/scraper/html_test.py::TestHTML5LibHTMLScraper::test_html_mojibake - AssertionError: {'http://example.com/文字化け'} != frozenset({'http://example.com/•¶Žš‰»‚¯'})
FAILED wpull/scraper/util_test.py::TestUtil::test_identifiy_link_type - AssertionError:  != None
FAILED wpull/testing/integration/http_app_test.py::TestHTTPGoodApp::test_app_input_file_arg_stdin - AttributeError: '_io.StringIO' object has no attribute 'buffer'
FAILED wpull/testing/integration/http_app_test.py::TestHTTPGoodApp::test_app_args - AssertionError: False is not true
FAILED wpull/testing/integration/http_app_test.py::TestHTTPGoodApp::test_sitemaps - AssertionError: False is not true
FAILED wpull/testing/integration/http_app_test.py::TestHTTPBadApp::test_bad_cookie - AssertionError: 4 != 3
FAILED wpull/testing/integration/phantomjs_test.py::TestPhantomJS::test_app_phantomjs - AssertionError: False is not true
FAILED wpull/testing/integration/phantomjs_test.py::TestPhantomJS::test_app_phantomjs_scroll - FileNotFoundError: [Errno 2] No such file or directory: 'DEUUEAUGH.html.snapshot.html'
FAILED wpull/url_test.py::TestURL::test_ip_address_normalization - AssertionError: 'http://[::ffff:c000:280]/' != 'http://[::ffff:192.0.2.128]/'
FAILED wpull/testing/integration/http_app_test.py::TestHTTPGoodApp::test_session_cookie - AssertionError: 0 != 8
FAILED wpull/testing/integration/phantomjs_test.py::TestPhantomJSHTTPS::test_app_phantomjs - AssertionError: False is not true
FAILED wpull/testing/integration/phantomjs_test.py::TestPhantomJSHTTPS::test_app_phantomjs_scroll - FileNotFoundError: [Errno 2] No such file or directory: 'DEUUEAUGH.html.snapshot.html'
FAILED wpull/testing/integration/script_test.py::TestScriptGoodApp::test_app_python_plugin_script - AssertionError: 42 != 1

This pull request fixes (at least) #332, #404 and obsoletes #325, #402, #413, #426

I remembered to:

  • Update or add unit tests if needed
  • Update or add documentation/comments if needed
  • Made sure stray files or whitespace didn't get committed
  • If significant changes, branch from develop and set to merge into develop
  • Read the guidelines for contributing

Changes: A bunch, really. I'd say 80% deprecation fixes, 15% test fixes and 5% modern packaging.

Important

This pull request is best reviewed by looking at the individual commits

mweinelt and others added 18 commits June 9, 2025 02:08
Renames wpull.testing.async to wpull.testing._async, because async has
become a reserved term in 3.7.

The @asyncio.coroutine decorator was deprecated in 3.8 and removed in
3.10. Coroutines now use `async def fn` syntax.

The `yield from` expression in a coroutine was replaced by `await`
in 3.5.

The `asyncio.async` was an alias to `ensure_future`, deprecated in 3.4.4
and removed whenever.

And instead of `with (yield from lock)` locks now use the
`async with lock` construction.
These imports were deprecated from Python 3.3, but only removed in 3.10.
This is going to be removed in newer Tornado versions.
and target Python 3.9 for now.
This is the last version from 2020 in some earlier version Tokenizer was
made into a private module.
Otherwise it cannot be added to a set.
This unbreaks import Template from the stdlib string library.
It gets put into a set for a test, but a named tuple is not hashable when
it contains a stdlib dict.

We therefore install the frozendict dependency to satisfy that need.
Not a random private address that might be routable or blackholed,
because it is used in a project like DN42 or Freifunk.
Locking is super important if you want to reproduce a certain state of
the package, as has been the case for wpull these last few years.

And provide a direnv integration to attach to the virtualenv.
The nosetests framework is effectively dead and earlier work has made the
tests run with pytest, which is today's de facto test runner in Python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants