space characters: should we require escapes on all of them?

Allowing (or not) a space character in a web address was mentioned recently in the thread on HTML5, and I got to thinking: Unicode also includes other non-control whitespace characters and these don't appear to be dealt with anywhere, including the security section of draft-07.

I like that IRIs do not have spaces in them. An IRI is an identifier and should not be regarded as a repository for prose. But, since the space character must be escaped, I think perhaps that the other Unicode whitespace characters (category Zs) should be treated similarly and would suggest adding a prohibition on them to section 7.3 in draft-07. This would help defend against visual spoofing such as using an "em space" (U+2002) to make a single IRI look like two adjacent IRIs.

If we don't prohibit these characters, maybe there should at least be a note in the security section mentioning them for exactly that reason.


Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 5 January 2010 00:23:51 UTC