Re: Percent-encoded fragment identfiers and id matching - (was: Fragment Identifiers for HTML5)

On Sun, 22 Jun 2008, Michael A. Puls II wrote:
> 
> Consider the following: [...]
> 
>         <div id="&#8730; 45">Destination</div>
>         <p><a href="#&#8730; 45">Test 1</a></p>
>         <p><a href="#%E2%88%9A%2045">Test 2</a></p>
>         <p><a href="#%E2%88%9A 45">Test 3</a></p>
>         <p><a href="#&#8730; 45">Test 4</a></p>
> 
> In Firefox and Safari, the fragids for Test 1, Test 2, Test 3 and Test 4 
> all match the id for the Destination div. In Opera and IE, they do not.

To be precise, according to my tests, what's going on is:

 * IE treats fragment identifiers literally, with full Unicode support, 
   and the "%" character has no special meaning.

 * Opera converts %-escapes to octets, and silently fails if any of the 
   characters (including those expanded octets) are non-ASCII.

 * Firefox converts %-escapes to octets and interprets them as UTF-8, and 
   then searches for the result, with full Unicode support.

 * Safari searches for the literal string, and if that fails, %-escapes to 
   octets and interprets them as UTF-8, then searches for that; all with 
   full Unicode support.

 * The URI spec RFC3986 isn't overly explicit about this but it seems from 
   the grammar that it is intended that user agents support %-escapes in 
   the fragment identifer.

It seems like the definition in RFC3986 is enough, and that Firefox is 
thus the correct implementation.

I have therefore not done anything special in HTML5 for this.


> I have the same type of concern for usemap="#percent-encoded_value" 
> matching.

Those are explcitly not URLs in HTML5.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Sunday, 29 June 2008 09:11:41 UTC