W3C home > Mailing lists > Public > public-html@w3.org > March 2009

Re: "Web addresses in HTML 5" for review (ISSUE-56 urls-webarch)

From: Anne van Kesteren <annevk@opera.com>
Date: Tue, 24 Mar 2009 00:02:44 +0100
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>, "Dan Connolly" <connolly@w3.org>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: "HTML WG" <public-html@w3.org>
Message-ID: <op.uq9oyuqd64w2qv@annevk-t60.oslo.opera.com>
On Mon, 23 Mar 2009 18:31:51 +0100, Henry S. Thompson <ht@inf.ed.ac.uk>  
> So, in particular, if I try to use the
> result of the following Python expression in an HTML5 document as the
> value of an href attribute, what does this document say should happen?
>    "http://www.example.org/zero"+chr(0)+"here/"

U+0000 is replaced by U+FFFD during tokenization. That character is then  
"UTF-8 percent-escaped". (In case of an href DOM attribute the U+0000  
would be "UTF-8 percent-escaped" as far as I can tell if you escape it  
using ECMAScript escape syntax.)

> Or (and this would presumably only be different if you adopt my LEIRI
> request):
>    "http://www.example.org/combiningChar"+unichr(0XD800)+"here"

As far as I can tell the character is "UTF-8 percent-escaped" and not  
treated in any special way. (Though it will generate an error in an HTML5  
validator and is not conforming.)

Anne van Kesteren
Received on Monday, 23 March 2009 23:03:39 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:44 UTC