Re: fragid navigation and pct-encoded from Boris Zbarsky on 2009-02-18 (public-html@w3.org from February 2009)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 17 Feb 2009 22:47:48 -0500
To: Ian Hickson <ian@hixie.ch>
CC: HTML WG <public-html@w3.org>
Message-ID: <499B84E4.6000305@mit.edu>

Ian Hickson wrote:
> Instead, I have made HTML5 require id="" attributes to be matched after 
> decoding the fragment identifier, and name="" attributes to be matched 
> before decoding the fragment identifier.

How does that work when the fragment identifier contains non-ASCII 
characters, or spaces, which end canonicalized up as escaped UTF-8 in URIs?

Since no such canonicalization happens for name attribute values, that 
would effectively mean that they never match...  In fact, the space 
issue is why Gecko unescapes the fragment identifier of the URI; see 
<https://bugzilla.mozilla.org/show_bug.cgi?id=46190>.  What's general UA 
behavior here?

That said, HTML4 appendix B section B.2.1 does suggest URL-escaping name 
attributes of <a> that contain non-ASCII characters.  Do some UAs do 
that?  It seems like it would lead to odd effects when getElementsByName 
is used, but maybe that's ok.  I would be fine with matching name="" 
without unescaping anything if @name got escaped as described in this 
section, I think.  There are issues with double-escaping, unfortunately. 
  :(  I think we have existing code that tries to handle those, though.

A related question: when unescaping, what encoding is used to convert 
the resulting bytes to Unicode?

-Boris

Received on Wednesday, 18 February 2009 03:48:36 UTC