RE: space characters: should we require escapes on all of them?

I'm not sure how interesting escaping is to prevent "visual spoofing".  Most users won't distinguish between %xx and %yy if they saw them, but I don't think they'd even see them because the browser or whatever would likely display the IRI in a friendly form anyway, with unescaped spaces.

-Shawn

-----Original Message-----
From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On Behalf Of Chris Weber
Sent: ,  05,  2010 11:39
To: 'Phillips, Addison'; public-iri@w3.org
Subject: RE: space characters: should we require escapes on all of them?

If the goal was to mitigate visual spoofing potential, then escaping Zs category characters would seem a good start.  But would you stop there?  Special characters such as the BOM U+FEFF, which has no direct mention I found in draft-07, could be used to exploit zero-width spacing, as could the joiners and other characters you're all probably familiar with. Combining marks could also be stacked in clever ways to make for invisible attacks.

On this subject, is this a bug in the spec section "7.3.  Characters not allowed in IRIs" where it says:

      Specials (U+FFF0-FFFD): These code points provide functionality
      beyond that useful in an IRI, for example byte order
      identification, annotation, and replacements for unknown
      characters and objects.  Their use and interpretation in an IRI
      would serve no purpose and might lead to confusing display
      variations.

When it refers to "byte order identification" did it mean to include U+FEFF in the range?


Chris Weber
Security Research
Casaba Security




-----Original Message-----
From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On Behalf Of Phillips, Addison
Sent: Monday, January 04, 2010 4:23 PM
To: public-iri@w3.org
Subject: space characters: should we require escapes on all of them?

Allowing (or not) a space character in a web address was mentioned recently in the thread on HTML5, and I got to thinking: Unicode also includes other non-control whitespace characters and these don't appear to be dealt with anywhere, including the security section of draft-07.

I like that IRIs do not have spaces in them. An IRI is an identifier and should not be regarded as a repository for prose. But, since the space character must be escaped, I think perhaps that the other Unicode whitespace characters (category Zs) should be treated similarly and would suggest adding a prohibition on them to section 7.3 in draft-07. This would help defend against visual spoofing such as using an "em space" (U+2002) to make a single IRI look like two adjacent IRIs.

If we don't prohibit these characters, maybe there should at least be a note in the security section mentioning them for exactly that reason.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 6 January 2010 00:25:07 UTC