Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking that parsed OTR protocol version is a digit is broken with python3 #56

Open
koolfy opened this issue Jan 15, 2015 · 5 comments
Assignees
Milestone

Comments

@koolfy
Copy link
Contributor

koolfy commented Jan 15, 2015

This line is problematic: https://github.com/python-otr/pure-python-otr/blob/staging/src/potr/proto.py#L134

Its purpose is to check that the parsed OTR tag advertises a correct protocol version by checking that the value is a digit.

First, since python3, isdigit() (and posssibly isdecimal and isnumeric) goes by a quite wider definition of "digit". For example, UTF-8 characters other than 0-9, but representing symbols containing a digit in them will match, as well as digits from other alphabets.

Example:

'9' == '⑨'
True

First, this caused panic and anger, but then, reading the OTRv2 spec:

"If she is willing to use OTR versions other than 1, a "v" followed by the byte identifiers for the versions in question, followed by "?". The byte identifier for OTR version 2 is "2". The order of the identifiers between the "v" and the "?" does not matter, but none should be listed more than once."

( https://otr.cypherpunks.ca/Protocol-v2-3.1.0.html )

we actually realised any byte might at some point designate a version. It just happens that the ascii value "2" was chosen.

In that context, a better option might be to remove this isdigit() test altogether, as it does not reflect a specification constraint.

@koolfy
Copy link
Contributor Author

koolfy commented Jan 15, 2015

It has also been suggested on irc that it might be wise to drop the conversion altogeher and just deal with bytes as-is when checking for different protocol versions?

@dequis
Copy link

dequis commented Jan 16, 2015

'9' == '⑨'
True

What? No, it doesn't do that. And int('⑨') is an error.

# python 3.4.2
>>> '9' == '⑨'
False
>>> '⑨'.isdigit()
True
>>> int('⑨')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '⑨'
>>> [ int(c) for c in '9⑨' if c.isdigit() ]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: invalid literal for int() with base 10: '⑨'

EDIT: python2 has the same behavior when using unicode strings:

# python 2.7.8
>>> u'9' == u'⑨'
False
>>> u'⑨'.isdigit()
True
>>> int(u'⑨')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'decimal' codec can't encode character u'\u2468' in position 0: invalid decimal Unicode string

@koolfy
Copy link
Contributor Author

koolfy commented Jan 16, 2015

You are correct, I mixed up my copy/pastes :)
I meant '⑨'.isdigit() returns True, and yes in does not allow to be int()'ed.

My point was that isdigit() does not do what this line was used for in our code back when we used python2.

@afflux
Copy link
Contributor

afflux commented Jan 16, 2015

I agree, this comparison should deal in bytes and not care about isdigit().

On January 16, 2015 12:50:29 AM CET, koolfy [email protected] wrote:

It has also been suggested on irc that it might be wise to drop the
conversion altogeher and just deal with bytes as-is when checking for
different protocol versions?


Reply to this email directly or view it on GitHub:
#56 (comment)

@koolfy
Copy link
Contributor Author

koolfy commented Jun 8, 2015

Wow this is old now.
I'll give it a try shortly.

@koolfy koolfy modified the milestone: 1.0.2 Jun 8, 2015
@koolfy koolfy modified the milestones: 1.0.2, 1.0.3 Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants