Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not allow utf-8 in domains to avoid lookalike character attacks #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

grosser
Copy link
Contributor

@grosser grosser commented Oct 4, 2016

http://www.irongeek.com/homoglyph-attack-generator.php

trying to port over zendesk#5

but does not work ... missing something :/

@jcheatham
/fyi @vmg sounds like a good feature ?

-"http://examplе.com"
+"<a href=\"http://examplе.com\">http://examplе.com</a>"

@jcheatham
Copy link

Looking at zendesk#5 I think there's still an edge case here that I failed to capture in the tests. Note that I also changed:

    if (!isalnum(data[0])) return 0;
    for (i = 1; i < size - 1; ++i) {
      ...

to

    if (data[0] == '.' || data[0] == '-') return 0;
    for (i = 0; i < size - 1; i++) {

I think the first character still has a chance to be exploited here.

@grosser
Copy link
Contributor Author

grosser commented Oct 4, 2016

that change seems weird ... a strange e would not be . or - ... so I remove
that change :D

On Tue, Oct 4, 2016 at 2:46 PM, Jonathan Cheatham [email protected]
wrote:

Looking at zendesk#5 zendesk#5 I think
there's still an edge case here that I failed to capture in the tests. Note
that I also changed:

if (!isalnum(data[0])) return 0;
for (i = 1; i < size - 1; ++i) {
  ...

to

if (data[0] == '.' || data[0] == '-') return 0;
for (i = 0; i < size - 1; i++) {

I think the first character still has a chance to be exploited here.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#63 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAsZ30x8of6cak6I8Bfs9jgGxIW-Dc-ks5qwskmgaJpZM4KOLyc
.

@jcheatham
Copy link

right, but notice that it also bumps the for loop forward by one, so if you had a strange 'e', isalnum('e') would be true, you wouldn't return 0/false, and you'd also skip past evaluating that character in the loop.

@kivikakk
Copy link
Contributor

kivikakk commented Oct 7, 2016

Thanks for the PR! This would be a lovely thing to get going, and I think it's important, but there's some bits missing to apply this to our fork.

rinku_isalnum doesn't consider non-ASCII UTF-8 text as valid — note its implementation — meaning the code change to autolink.c in this PR is actually a non-op (!), or rather, it does the same thing as rinku_isalnum.

It's notable that the tests are broken, because of complicating factors:

  1. check_domain will return true in three of the four assertions in the tests supplied — for the "www.exampl\u0435.com" cases, there is always at least one dot before we hit the non-ASCII-alphanumeric character.

    Likewise, for both of the Rinku::AUTOLINK_SHORT_DOMAINS cases (which overlaps with one of the above cases), allowing short domains means check_domain always returns true as long as the initial if (!rinku_isalnum(data[link->start])) doesn't cause a return false.

    Hence 3 of 4 assertions are currently failing.

  2. This would be okay if it at least caused the non-ASCII-alphanumeric character to be excluded from the resulting autolink. But link->end is manually set to utf8proc_find_space(data, link->end, size); after check_domain returns, in both autolink__url and autolink_www. So the character gets re-included.

    This issue probably arose because of the difference between your fork and this one.

  3. The above note about the initial character by @jcheatham is important in the source PR at Prevent autolinking unicode domains zendesk/rinku#5, but it's avoided here because we check the first character with the above-mentioned if (!rinku_isalnum(data[link->start])) test, which doesn't allow for UTF-8.

@jcheatham
Copy link

Ah, I had thought rinku_isalnum was a portability wrapper for the standard library check, not that it was some beefy custom lookup table. And excellent info on everything else (I'm still operating with obsolete knowledge of the old version), so I'll just shut up now :)

@grosser
Copy link
Contributor Author

grosser commented Oct 11, 2016

... a little bit better now ... only thing failing is utf8-space ... does the direction look ok / any good idea how to fix that ?

-"This is just a test. <a href=\"http://www.pokemon.com\">http://www.pokemon.com</a> \u2028 "
+"This is just a test. http://www.pokemon.com \u2028 "

@ACPK
Copy link

ACPK commented Mar 22, 2017

@grosser @vmg - Any suggestions on implementing this or is this not an issue? :)

@grosser
Copy link
Contributor Author

grosser commented Mar 22, 2017

stil and issue ... we use a fork to get around it :(
https://rubygems.org/gems/zendesk-rinku

@northeastprince
Copy link

@vmg @kivikakk is this gem still being maintained?

@kivikakk
Copy link
Contributor

Not particularly, at least by me! rinku is from an earlier era of Rails development, and there are most likely better options today.

@northeastprince
Copy link

northeastprince commented Dec 28, 2023

Wow, thanks for the quick response! 😆 From what I see, there's this, Zendesk's fork, and the original. Maybe the latter is the best at this point - I assume a 30x increase in speed from the already miniscule timing won't really make much of a difference in the long run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants