do not allow utf-8 in domains to avoid lookalike character attacks #63

grosser · 2016-10-04T21:34:08Z

http://www.irongeek.com/homoglyph-attack-generator.php

trying to port over zendesk#5

but does not work ... missing something :/

@jcheatham
/fyi @vmg sounds like a good feature ?

-"http://examplе.com"
+"<a href=\"http://examplе.com\">http://examplе.com</a>"

jcheatham · 2016-10-04T21:46:14Z

Looking at zendesk#5 I think there's still an edge case here that I failed to capture in the tests. Note that I also changed:

    if (!isalnum(data[0])) return 0;
    for (i = 1; i < size - 1; ++i) {
      ...

to

    if (data[0] == '.' || data[0] == '-') return 0;
    for (i = 0; i < size - 1; i++) {

I think the first character still has a chance to be exploited here.

grosser · 2016-10-04T21:48:59Z

that change seems weird ... a strange e would not be . or - ... so I remove
that change :D

On Tue, Oct 4, 2016 at 2:46 PM, Jonathan Cheatham [email protected]
wrote:

Looking at zendesk#5 zendesk#5 I think
there's still an edge case here that I failed to capture in the tests. Note
that I also changed:
if (!isalnum(data[0])) return 0;
for (i = 1; i < size - 1; ++i) {
  ...
to
if (data[0] == '.' || data[0] == '-') return 0;
for (i = 0; i < size - 1; i++) {
I think the first character still has a chance to be exploited here.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#63 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAsZ30x8of6cak6I8Bfs9jgGxIW-Dc-ks5qwskmgaJpZM4KOLyc
.

jcheatham · 2016-10-04T23:21:18Z

right, but notice that it also bumps the for loop forward by one, so if you had a strange 'e', isalnum('e') would be true, you wouldn't return 0/false, and you'd also skip past evaluating that character in the loop.

kivikakk · 2016-10-07T05:06:46Z

Thanks for the PR! This would be a lovely thing to get going, and I think it's important, but there's some bits missing to apply this to our fork.

rinku_isalnum doesn't consider non-ASCII UTF-8 text as valid — note its implementation — meaning the code change to autolink.c in this PR is actually a non-op (!), or rather, it does the same thing as rinku_isalnum.

It's notable that the tests are broken, because of complicating factors:

check_domain will return true in three of the four assertions in the tests supplied — for the "www.exampl\u0435.com" cases, there is always at least one dot before we hit the non-ASCII-alphanumeric character.

Likewise, for both of the Rinku::AUTOLINK_SHORT_DOMAINS cases (which overlaps with one of the above cases), allowing short domains means check_domain always returns true as long as the initial if (!rinku_isalnum(data[link->start])) doesn't cause a return false.

Hence 3 of 4 assertions are currently failing.
This would be okay if it at least caused the non-ASCII-alphanumeric character to be excluded from the resulting autolink. But link->end is manually set to utf8proc_find_space(data, link->end, size); after check_domain returns, in both autolink__url and autolink_www. So the character gets re-included.

This issue probably arose because of the difference between your fork and this one.
The above note about the initial character by @jcheatham is important in the source PR at Prevent autolinking unicode domains zendesk/rinku#5, but it's avoided here because we check the first character with the above-mentioned if (!rinku_isalnum(data[link->start])) test, which doesn't allow for UTF-8.

jcheatham · 2016-10-07T15:11:59Z

Ah, I had thought rinku_isalnum was a portability wrapper for the standard library check, not that it was some beefy custom lookup table. And excellent info on everything else (I'm still operating with obsolete knowledge of the old version), so I'll just shut up now :)

http://www.irongeek.com/homoglyph-attack-generator.php

grosser · 2016-10-11T23:36:00Z

... a little bit better now ... only thing failing is utf8-space ... does the direction look ok / any good idea how to fix that ?

-"This is just a test. <a href=\"http://www.pokemon.com\">http://www.pokemon.com</a> \u2028 "
+"This is just a test. http://www.pokemon.com \u2028 "

ACPK · 2017-03-22T02:33:25Z

@grosser @vmg - Any suggestions on implementing this or is this not an issue? :)

grosser · 2017-03-22T03:38:27Z

stil and issue ... we use a fork to get around it :(
https://rubygems.org/gems/zendesk-rinku

northeastprince · 2023-12-28T01:53:28Z

@vmg @kivikakk is this gem still being maintained?

kivikakk · 2023-12-28T01:56:46Z

Not particularly, at least by me! rinku is from an earlier era of Rails development, and there are most likely better options today.

northeastprince · 2023-12-28T02:00:27Z

Wow, thanks for the quick response! 😆 From what I see, there's this, Zendesk's fork, and the original. Maybe the latter is the best at this point - I assume a 30x increase in speed from the already miniscule timing won't really make much of a difference in the long run.

grosser force-pushed the grosser/lookalike branch from 41bd324 to dab47aa Compare October 11, 2016 22:34

do not allow utf-8 in domains to avoid lookalike character attacks

b5659b2

http://www.irongeek.com/homoglyph-attack-generator.php

grosser force-pushed the grosser/lookalike branch from dab47aa to b5659b2 Compare October 11, 2016 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not allow utf-8 in domains to avoid lookalike character attacks #63

do not allow utf-8 in domains to avoid lookalike character attacks #63

grosser commented Oct 4, 2016

jcheatham commented Oct 4, 2016

grosser commented Oct 4, 2016

jcheatham commented Oct 4, 2016

kivikakk commented Oct 7, 2016

jcheatham commented Oct 7, 2016

grosser commented Oct 11, 2016

ACPK commented Mar 22, 2017 •

edited

Loading

grosser commented Mar 22, 2017

northeastprince commented Dec 28, 2023

kivikakk commented Dec 28, 2023

northeastprince commented Dec 28, 2023 •

edited

Loading

do not allow utf-8 in domains to avoid lookalike character attacks #63

Are you sure you want to change the base?

do not allow utf-8 in domains to avoid lookalike character attacks #63

Conversation

grosser commented Oct 4, 2016

jcheatham commented Oct 4, 2016

grosser commented Oct 4, 2016

jcheatham commented Oct 4, 2016

kivikakk commented Oct 7, 2016

jcheatham commented Oct 7, 2016

grosser commented Oct 11, 2016

ACPK commented Mar 22, 2017 • edited Loading

grosser commented Mar 22, 2017

northeastprince commented Dec 28, 2023

kivikakk commented Dec 28, 2023

northeastprince commented Dec 28, 2023 • edited Loading

ACPK commented Mar 22, 2017 •

edited

Loading

northeastprince commented Dec 28, 2023 •

edited

Loading