Re: fragid navigation and pct-encoded

On Feb 17, 2009, at 7:24 PM, Ian Hickson wrote:
> On Thu, 4 Sep 2008, Anne van Kesteren wrote:
>>
>> Apparently there are some differences between browsers in the  
>> handling
>> of percent escaped characters in fragment identifiers. I made a few
>> tests to figure out the different behavior:
>>
>>   http://tc.labs.opera.com/html/navigation/fragids/
>>
>> I was able to test in Opera 9.5, Firefox 3.0, and Internet  
>> Explorer 6.0.
>> Results:
>>
>> IE does not handle pct-encoded in fragment which is in violation  
>> of RFC
>> 3986. It does nothing special with either the name or id attributes;
>> simple literal matching.
>>
>> Firefox does handle pct-encoded in fragment. It also handles pct- 
>> encoded
>> in the name attribute. It effectively performs pct-encoded  
>> handling in
>> fragment and name attributes and after that performs literal  
>> matching.
>> Thus a fragment of ? and a name attribute of %3FC match and vice  
>> versa.
>> Likewise, a fragment of %253F does not match a name attribute of % 
>> 3FC.
>> The id attribute is not affected by pct-encoded handling. So a  
>> fragment
>> of ? does not match an id attribute of %3F.
>>
>> Opera does handle pct-encoded in fragment. It does not have special
>> handling of attributes. This is the behavior prescribed by HTML5 but
>> breaks sites. Eg,
>>
>>   http://www.readynas.com/forum/faq.php
>>
>> The test suite assumes Firefox is correct as that seems the most
>> "sensible" behavior if you want to be compliant with RFC 3986 and
>> compatible with the Web. I suggest we change HTML5 to perform
>> pct-encoded handling for name attributes. I have not checked whether
>> this affects the usemap attribute.
>
> I have updated the spec, but I have not used the recommendation above.
>
> Since Firefox was the only browser I could find that did the  
> complicated
> and potentially slow unescaping of all name="" attributes, I was  
> reluctant
> to introduce this new, somewhat surprising, behaviour. (New in  
> terms of
> not being justified by any existing spec.)

I don't see how you could come to that conclusion.

RFC 3986 section 2 specifically defines URI components as being in
encoded form.  That form has to be decoded in order to extract the
data for identifying a resource.  A fragment therefore has to be
decoded in order to find the name/id value.  There is no other way
of reading it.

The id attribute in HTML5 is defined to be an opaque string,
presumably in the document character encoding.  Therefore, either
the data in the fragment has to be converted to the document character
encoding, or the data in the id has to be converted to the URI encoding,
before the two can be compared as opaque strings.

The name attribute in HTML4 is defined to be cdata in the document
character encoding.  Therefore, either the data in the fragment has
to be converted to the document character encoding, or the data in
the name attribute has to be converted to the URI encoding, before
the two can be compared as opaque strings.

Firefox is doing what was recommended by HTML4:

   http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars

     Note. The same conversion based on UTF-8 should be applied to
     values of the name attribute for the A element.

In other words, HTML4 recommends converting the name attribute
to URI encoding prior to comparison.

....Roy

Received on Wednesday, 18 February 2009 04:43:05 UTC