Skip to content

Commit ccad61e

Browse files
barneygalepicnixz
andauthored
GH-125866: Support complete "file:" URLs in urllib (#132378)
Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when set to true, a complete URL is returned. Likewise add optional *require_scheme* argument to `url2pathname()`; when set to true, a complete URL is accepted. Co-authored-by: Bénédikt Tran <[email protected]>
1 parent 4d3ad04 commit ccad61e

File tree

9 files changed

+121
-33
lines changed

9 files changed

+121
-33
lines changed

Doc/library/urllib.request.rst

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -146,16 +146,19 @@ The :mod:`urllib.request` module defines the following functions:
146146
attribute to modify its position in the handlers list.
147147

148148

149-
.. function:: pathname2url(path)
149+
.. function:: pathname2url(path, *, add_scheme=False)
150150

151151
Convert the given local path to a ``file:`` URL. This function uses
152-
:func:`~urllib.parse.quote` function to encode the path. For historical
153-
reasons, the return value omits the ``file:`` scheme prefix. This example
154-
shows the function being used on Windows::
152+
:func:`~urllib.parse.quote` function to encode the path.
153+
154+
If *add_scheme* is false (the default), the return value omits the
155+
``file:`` scheme prefix. Set *add_scheme* to true to return a complete URL.
156+
157+
This example shows the function being used on Windows::
155158

156159
>>> from urllib.request import pathname2url
157160
>>> path = 'C:\\Program Files'
158-
>>> 'file:' + pathname2url(path)
161+
>>> pathname2url(path, add_scheme=True)
159162
'file:///C:/Program%20Files'
160163

161164
.. versionchanged:: 3.14
@@ -168,17 +171,25 @@ The :mod:`urllib.request` module defines the following functions:
168171
sections. For example, the path ``/etc/hosts`` is converted to
169172
the URL ``///etc/hosts``.
170173

174+
.. versionchanged:: next
175+
The *add_scheme* argument was added.
176+
171177

172-
.. function:: url2pathname(url)
178+
.. function:: url2pathname(url, *, require_scheme=False)
173179

174180
Convert the given ``file:`` URL to a local path. This function uses
175-
:func:`~urllib.parse.unquote` to decode the URL. For historical reasons,
176-
the given value *must* omit the ``file:`` scheme prefix. This example shows
177-
the function being used on Windows::
181+
:func:`~urllib.parse.unquote` to decode the URL.
182+
183+
If *require_scheme* is false (the default), the given value should omit a
184+
``file:`` scheme prefix. If *require_scheme* is set to true, the given
185+
value should include the prefix; a :exc:`~urllib.error.URLError` is raised
186+
if it doesn't.
187+
188+
This example shows the function being used on Windows::
178189

179190
>>> from urllib.request import url2pathname
180191
>>> url = 'file:///C:/Program%20Files'
181-
>>> url2pathname(url.removeprefix('file:'))
192+
>>> url2pathname(url, require_scheme=True)
182193
'C:\\Program Files'
183194

184195
.. versionchanged:: 3.14
@@ -193,6 +204,9 @@ The :mod:`urllib.request` module defines the following functions:
193204
returned (as before), and on other platforms a
194205
:exc:`~urllib.error.URLError` is raised.
195206

207+
.. versionchanged:: next
208+
The *require_scheme* argument was added.
209+
196210

197211
.. function:: getproxies()
198212

Doc/whatsnew/3.14.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1218,16 +1218,20 @@ urllib
12181218
supporting SHA-256 digest authentication as specified in :rfc:`7616`.
12191219
(Contributed by Calvin Bui in :gh:`128193`.)
12201220

1221-
* Improve standards compliance when parsing and emitting ``file:`` URLs.
1221+
* Improve ergonomics and standards compliance when parsing and emitting
1222+
``file:`` URLs.
12221223

12231224
In :func:`urllib.request.url2pathname`:
12241225

1226+
- Accept a complete URL when the new *require_scheme* argument is set to
1227+
true.
12251228
- Discard URL authorities that resolve to a local IP address.
12261229
- Raise :exc:`~urllib.error.URLError` if a URL authority doesn't resolve
1227-
to ``localhost``, except on Windows where we return a UNC path.
1230+
to a local IP address, except on Windows where we return a UNC path.
12281231

12291232
In :func:`urllib.request.pathname2url`:
12301233

1234+
- Return a complete URL when the new *add_scheme* argument is set to true.
12311235
- Include an empty URL authority when a path begins with a slash. For
12321236
example, the path ``/etc/hosts`` is converted to the URL ``///etc/hosts``.
12331237

Lib/pathlib/__init__.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1271,17 +1271,15 @@ def as_uri(self):
12711271
if not self.is_absolute():
12721272
raise ValueError("relative paths can't be expressed as file URIs")
12731273
from urllib.request import pathname2url
1274-
return f'file:{pathname2url(str(self))}'
1274+
return pathname2url(str(self), add_scheme=True)
12751275

12761276
@classmethod
12771277
def from_uri(cls, uri):
12781278
"""Return a new path from the given 'file' URI."""
1279-
if not uri.startswith('file:'):
1280-
raise ValueError(f"URI does not start with 'file:': {uri!r}")
12811279
from urllib.error import URLError
12821280
from urllib.request import url2pathname
12831281
try:
1284-
path = cls(url2pathname(uri.removeprefix('file:')))
1282+
path = cls(url2pathname(uri, require_scheme=True))
12851283
except URLError as exc:
12861284
raise ValueError(exc.reason) from None
12871285
if not path.is_absolute():

Lib/test/test_pathlib/test_pathlib.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3302,8 +3302,8 @@ def test_from_uri_posix(self):
33023302
@needs_posix
33033303
def test_from_uri_pathname2url_posix(self):
33043304
P = self.cls
3305-
self.assertEqual(P.from_uri('file:' + pathname2url('/foo/bar')), P('/foo/bar'))
3306-
self.assertEqual(P.from_uri('file:' + pathname2url('//foo/bar')), P('//foo/bar'))
3305+
self.assertEqual(P.from_uri(pathname2url('/foo/bar', add_scheme=True)), P('/foo/bar'))
3306+
self.assertEqual(P.from_uri(pathname2url('//foo/bar', add_scheme=True)), P('//foo/bar'))
33073307

33083308
@needs_windows
33093309
def test_absolute_windows(self):

Lib/test/test_urllib.py

Lines changed: 60 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -476,7 +476,7 @@ def test_missing_localfile(self):
476476

477477
def test_file_notexists(self):
478478
fd, tmp_file = tempfile.mkstemp()
479-
tmp_file_canon_url = 'file:' + urllib.request.pathname2url(tmp_file)
479+
tmp_file_canon_url = urllib.request.pathname2url(tmp_file, add_scheme=True)
480480
parsed = urllib.parse.urlsplit(tmp_file_canon_url)
481481
tmp_fileurl = parsed._replace(netloc='localhost').geturl()
482482
try:
@@ -620,7 +620,7 @@ def tearDown(self):
620620

621621
def constructLocalFileUrl(self, filePath):
622622
filePath = os.path.abspath(filePath)
623-
return "file:" + urllib.request.pathname2url(filePath)
623+
return urllib.request.pathname2url(filePath, add_scheme=True)
624624

625625
def createNewTempFile(self, data=b""):
626626
"""Creates a new temporary file containing the specified data,
@@ -1436,6 +1436,21 @@ def test_pathname2url(self):
14361436
self.assertEqual(fn(f'{sep}a{sep}b.c'), '///a/b.c')
14371437
self.assertEqual(fn(f'{sep}a{sep}b%#c'), '///a/b%25%23c')
14381438

1439+
def test_pathname2url_add_scheme(self):
1440+
sep = os.path.sep
1441+
subtests = [
1442+
('', 'file:'),
1443+
(sep, 'file:///'),
1444+
('a', 'file:a'),
1445+
(f'a{sep}b.c', 'file:a/b.c'),
1446+
(f'{sep}a{sep}b.c', 'file:///a/b.c'),
1447+
(f'{sep}a{sep}b%#c', 'file:///a/b%25%23c'),
1448+
]
1449+
for path, expected_url in subtests:
1450+
with self.subTest(path=path):
1451+
self.assertEqual(
1452+
urllib.request.pathname2url(path, add_scheme=True), expected_url)
1453+
14391454
@unittest.skipUnless(sys.platform == 'win32',
14401455
'test specific to Windows pathnames.')
14411456
def test_pathname2url_win(self):
@@ -1503,6 +1518,49 @@ def test_url2pathname(self):
15031518
self.assertEqual(fn('//localhost/foo/bar'), f'{sep}foo{sep}bar')
15041519
self.assertEqual(fn('///foo/bar'), f'{sep}foo{sep}bar')
15051520
self.assertEqual(fn('////foo/bar'), f'{sep}{sep}foo{sep}bar')
1521+
self.assertEqual(fn('data:blah'), 'data:blah')
1522+
self.assertEqual(fn('data://blah'), f'data:{sep}{sep}blah')
1523+
1524+
def test_url2pathname_require_scheme(self):
1525+
sep = os.path.sep
1526+
subtests = [
1527+
('file:', ''),
1528+
('FILE:', ''),
1529+
('FiLe:', ''),
1530+
('file:/', f'{sep}'),
1531+
('file:///', f'{sep}'),
1532+
('file:////', f'{sep}{sep}'),
1533+
('file:foo', 'foo'),
1534+
('file:foo/bar', f'foo{sep}bar'),
1535+
('file:/foo/bar', f'{sep}foo{sep}bar'),
1536+
('file://localhost/foo/bar', f'{sep}foo{sep}bar'),
1537+
('file:///foo/bar', f'{sep}foo{sep}bar'),
1538+
('file:////foo/bar', f'{sep}{sep}foo{sep}bar'),
1539+
('file:data:blah', 'data:blah'),
1540+
('file:data://blah', f'data:{sep}{sep}blah'),
1541+
]
1542+
for url, expected_path in subtests:
1543+
with self.subTest(url=url):
1544+
self.assertEqual(
1545+
urllib.request.url2pathname(url, require_scheme=True),
1546+
expected_path)
1547+
1548+
error_subtests = [
1549+
'',
1550+
':',
1551+
'foo',
1552+
'http:foo',
1553+
'localfile:foo',
1554+
'data:foo',
1555+
'data:file:foo',
1556+
'data:file://foo',
1557+
]
1558+
for url in error_subtests:
1559+
with self.subTest(url=url):
1560+
self.assertRaises(
1561+
urllib.error.URLError,
1562+
urllib.request.url2pathname,
1563+
url, require_scheme=True)
15061564

15071565
@unittest.skipUnless(sys.platform == 'win32',
15081566
'test specific to Windows pathnames.')

Lib/test/test_urllib2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -809,7 +809,7 @@ def test_file(self):
809809

810810
TESTFN = os_helper.TESTFN
811811
towrite = b"hello, world\n"
812-
canonurl = 'file:' + urllib.request.pathname2url(os.path.abspath(TESTFN))
812+
canonurl = urllib.request.pathname2url(os.path.abspath(TESTFN), add_scheme=True)
813813
parsed = urlsplit(canonurl)
814814
if parsed.netloc:
815815
raise unittest.SkipTest("non-local working directory")

Lib/test/test_urllib2net.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ def test_file(self):
150150
f.write('hi there\n')
151151
f.close()
152152
urls = [
153-
'file:' + urllib.request.pathname2url(os.path.abspath(TESTFN)),
153+
urllib.request.pathname2url(os.path.abspath(TESTFN), add_scheme=True),
154154
('file:///nonsensename/etc/passwd', None,
155155
urllib.error.URLError),
156156
]

Lib/urllib/request.py

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1466,17 +1466,16 @@ def get_names(self):
14661466
def open_local_file(self, req):
14671467
import email.utils
14681468
import mimetypes
1469-
filename = _splittype(req.full_url)[1]
1470-
localfile = url2pathname(filename)
1469+
localfile = url2pathname(req.full_url, require_scheme=True)
14711470
try:
14721471
stats = os.stat(localfile)
14731472
size = stats.st_size
14741473
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
1475-
mtype = mimetypes.guess_type(filename)[0]
1474+
mtype = mimetypes.guess_file_type(localfile)[0]
14761475
headers = email.message_from_string(
14771476
'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
14781477
(mtype or 'text/plain', size, modified))
1479-
origurl = f'file:{pathname2url(localfile)}'
1478+
origurl = pathname2url(localfile, add_scheme=True)
14801479
return addinfourl(open(localfile, 'rb'), headers, origurl)
14811480
except OSError as exp:
14821481
raise URLError(exp, exp.filename)
@@ -1635,9 +1634,16 @@ def data_open(self, req):
16351634

16361635
# Code move from the old urllib module
16371636

1638-
def url2pathname(url):
1639-
"""OS-specific conversion from a relative URL of the 'file' scheme
1640-
to a file system path; not recommended for general use."""
1637+
def url2pathname(url, *, require_scheme=False):
1638+
"""Convert the given file URL to a local file system path.
1639+
1640+
The 'file:' scheme prefix must be omitted unless *require_scheme*
1641+
is set to true.
1642+
"""
1643+
if require_scheme:
1644+
scheme, url = _splittype(url)
1645+
if scheme != 'file':
1646+
raise URLError("URL is missing a 'file:' scheme")
16411647
authority, url = _splithost(url)
16421648
if os.name == 'nt':
16431649
if not _is_local_authority(authority):
@@ -1661,13 +1667,17 @@ def url2pathname(url):
16611667
return unquote(url, encoding=encoding, errors=errors)
16621668

16631669

1664-
def pathname2url(pathname):
1665-
"""OS-specific conversion from a file system path to a relative URL
1666-
of the 'file' scheme; not recommended for general use."""
1670+
def pathname2url(pathname, *, add_scheme=False):
1671+
"""Convert the given local file system path to a file URL.
1672+
1673+
The 'file:' scheme prefix is omitted unless *add_scheme*
1674+
is set to true.
1675+
"""
16671676
if os.name == 'nt':
16681677
pathname = pathname.replace('\\', '/')
16691678
encoding = sys.getfilesystemencoding()
16701679
errors = sys.getfilesystemencodeerrors()
1680+
scheme = 'file:' if add_scheme else ''
16711681
drive, root, tail = os.path.splitroot(pathname)
16721682
if drive:
16731683
# First, clean up some special forms. We are going to sacrifice the
@@ -1689,7 +1699,7 @@ def pathname2url(pathname):
16891699
# avoids interpreting the path as a URL authority.
16901700
root = '//' + root
16911701
tail = quote(tail, encoding=encoding, errors=errors)
1692-
return drive + root + tail
1702+
return scheme + drive + root + tail
16931703

16941704

16951705
# Utility functions
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Add optional *add_scheme* argument to :func:`urllib.request.pathname2url`; when
2+
set to true, a complete URL is returned. Likewise add optional *require_scheme*
3+
argument to :func:`~urllib.request.url2pathname`; when set to true, a complete
4+
URL is accepted.

0 commit comments

Comments
 (0)