- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Tue, 17 Feb 2009 22:47:48 -0500
- To: Ian Hickson <ian@hixie.ch>
- CC: HTML WG <public-html@w3.org>
Ian Hickson wrote: > Instead, I have made HTML5 require id="" attributes to be matched after > decoding the fragment identifier, and name="" attributes to be matched > before decoding the fragment identifier. How does that work when the fragment identifier contains non-ASCII characters, or spaces, which end canonicalized up as escaped UTF-8 in URIs? Since no such canonicalization happens for name attribute values, that would effectively mean that they never match... In fact, the space issue is why Gecko unescapes the fragment identifier of the URI; see <https://bugzilla.mozilla.org/show_bug.cgi?id=46190>. What's general UA behavior here? That said, HTML4 appendix B section B.2.1 does suggest URL-escaping name attributes of <a> that contain non-ASCII characters. Do some UAs do that? It seems like it would lead to odd effects when getElementsByName is used, but maybe that's ok. I would be fine with matching name="" without unescaping anything if @name got escaped as described in this section, I think. There are issues with double-escaping, unfortunately. :( I think we have existing code that tries to handle those, though. A related question: when unescaping, what encoding is used to convert the resulting bytes to Unicode? -Boris
Received on Wednesday, 18 February 2009 03:48:36 UTC