Conversation
…der for any reserved characters, and used it in .child() and .sibling(), will add it in further methods shortly
…obably have the issue with not-yet-normalized query parameter names (mixed decoded and encoded query parameter names that overlap).
… overlapping due to not being normalized
…ation (.append, etc.). Also add and test __eq__, __ne__, and __hash__
…RLs of some complexity. Had to add userinfo to URL.normalize() to help with equality checks.
… and other Twisted codebases
Codecov Report
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
+ Coverage 97.8% 97.94% +0.13%
==========================================
Files 6 8 +2
Lines 1137 1408 +271
Branches 137 164 +27
==========================================
+ Hits 1112 1379 +267
- Misses 13 14 +1
- Partials 12 15 +3
Continue to review full report at Codecov.
|
hyperlink/_url.py
Outdated
|
|
||
| def _encode_reserved(text, maximal=True): | ||
| """A very comprehensive percent encoding for encoding all | ||
| delimeters. Used for arguments to DecodedURL, where a % means a |
There was a problem hiding this comment.
Oooh, this kind of typo is so very unlike me, I assure you! ;) Thanks!
| return not self.__eq__(other) | ||
|
|
||
| def __hash__(self): | ||
| return hash((self.__class__, self.scheme, self.userinfo, self.host, |
There was a problem hiding this comment.
minor: Since equality is delegated upwards to URL, would it be better to do the same for hashing?
e.g.
def __hash__(self):
return hash(self.normalize().to_uri())There was a problem hiding this comment.
That's a good point. I may have subtly changed my mind midway through development, and now I lean more toward not delegating up to URL for equality. I'll get that fixed, thanks!
There was a problem hiding this comment.
Whenever you're delegating hash like this, remember you should also include a tweak, so that the DecodedURL and the URL representing the same data don't hash the same.
There was a problem hiding this comment.
@glyph why is that? (correctness shouldn't be effected since dictionaries compare by equality, and I'm not entirely sure what the performance problem you're trying to forestall)
There was a problem hiding this comment.
I share @moshez's curiosity.
@mahmoud - will you follow @alexwlchan's advice about delegating __hash__?
|
|
||
| __all__ = [ | ||
| "URL", | ||
| "DecodedURL", |
There was a problem hiding this comment.
I want to say that we should just avoid exposing this entirely, but it probably needs to be exposed for type annotations.
However, as per #44, could we have a "decoded" property on URL that provides an interface to this, and an "encoded" property on DecodedURL that maps back to a URL?
(At this point I think I'm in favor of adding an EncodedURL alias for URL, then maybe adding a top-level entry point like hyperlink.parse() which takes a decoded kwarg flag which defaults to true, to make it easier to get started with DecodedURL which is what I think we all want most of the time. That can definitely be deferred to a separate ticket though.)
There was a problem hiding this comment.
Yes, I am pretty much in favor of all those conveniences :) And I also agree that this probably needs to be exposed.
…vel API. fix a bug with userinfo where double-escapes were possible because % wasn't marked as safe. add lazy option to DecodedURL, also exposed in parse and URL's new get_decoded_url() method. add a few more notes and comment tweaks
…verage-oriented tests. Remove the password attributes of DecodedURL pending future discussion.
markrwilliams
left a comment
There was a problem hiding this comment.
The approach seems good and it addresses my question:
>>> from hyperlink import DecodedURL, URL
>>> d = DecodedURL(URL())
>>> d.add(u'parameter', u'#value').to_text()
u'?parameter=%23value'
I think this means we should use DecodedURL. in twisted.python.url. That's great!
I've left comments about documentation improvements and lingering TODOs. I'd also like to see clarity around __hash__. Please address these issues with changes or PR comments before merging.
|
|
||
| def normalize(self, scheme=True, host=True, path=True, query=True, | ||
| fragment=True): | ||
| fragment=True, userinfo=True): |
There was a problem hiding this comment.
The docstring should mention userinfo.
There's a TODO for userinfo. Should that go?
hyperlink/_url.py
Outdated
| handled automatically. | ||
|
|
||
| Where applicable, a UTF-8 encoding is presumed. Be advised that | ||
| some interactions, can raise UnicodeEncodeErrors and |
There was a problem hiding this comment.
comma splice:
some interactions , can raise UnicodeEncodeErrors...
Also, maybe you want backticks around UnicodeEncodeErrors and UnicodeDecodeErrors.
hyperlink/_url.py
Outdated
| UnicodeDecodeErrors, just like when working with | ||
| bytestrings. | ||
|
|
||
| Examples of such interactions include handling query strings |
There was a problem hiding this comment.
Should this be part of the previous paragraph?
| encoding binary data, and paths containing segments with special | ||
| characters encoded with codecs other than UTF-8. | ||
| """ | ||
| def __init__(self, url, lazy=False): |
There was a problem hiding this comment.
If this is a public class, then its initializer should be documented. Please include an explanation of url and lazy in the docstring.
hyperlink/_url.py
Outdated
|
|
||
| def click(self, href=u''): | ||
| "Return a new DecodedURL wrapping the result of :meth:`~hyperlink.URL.click()`" | ||
| return type(self)(self._url.click(href=href)) |
| durl = durl.set(' ', 'spa%ed') | ||
| assert durl.get(' ') == ['spa%ed'] | ||
|
|
||
| durl = DecodedURL(url=durl.to_uri()) |
There was a problem hiding this comment.
What's the point of this? Round tripping?
hyperlink/test/test_decoded_url.py
Outdated
|
|
||
| assert durl.set('arg', 'd').get('arg') == ['d'] | ||
|
|
||
| def test_equivalences(self): |
There was a problem hiding this comment.
This really tests __eq__ and __hash__, so maybe test_equality_and_hashability?
|
|
||
| durl_map = {} | ||
| durl_map[durl] = durl | ||
| durl_map[durl2] = durl2 |
There was a problem hiding this comment.
This should also test that a URL and a DecodedURL that represent the same underlying URL don't overlap in a dict (or set.)
|
|
||
| assert len(durl_map) == 1 | ||
|
|
||
| def test_replace_roundtrip(self): |
There was a problem hiding this comment.
This is good, but there should also be a roundtrip test between TOTAL_URL and DecodedURL.
|
|
||
| BASIC_URL = 'http://example.com/#' | ||
| TOTAL_URL = "https://%75%73%65%72:%00%00%00%00@xn--bcher-kva.ch:8080/a/nice%20nice/./path/?zot=23%25&zut#frég" | ||
| UNDECODABLE_FRAG_URL = TOTAL_URL + '%C3' |
There was a problem hiding this comment.
It'd be good to note that this is undecodable because %C3 makes it invalid UTF-8.
This still needs some documentation, but to address #44 and the handful of issues surrounding the central problem, I present
DecodedURL. It takes care of handling reserved characters so you don't have to.From the docstring:
It's tested, works, and seems practical, though, so take a look!