- From: L. David Baron <dbaron@dbaron.org>
- Date: Wed, 14 Mar 2007 11:57:47 -0700
On Wednesday 2007-03-14 15:20 +0100, Peter Karlsson wrote: > L. David Baron on 2007-03-13: > > >I tend to think it would be good that new uses of URIs/IRIs document that > >they are really IRIs and therefore this reverse-encoding behavior should > >not be used, but instead encoding should be done as UTF-8. > > You cannot have UTF-8 encoding just for the URIs/IRIs, and another encoding > for the rest of the source text. To properly parse a URI/IRI in the source > document, you must first convert the bytes in the resource into a stream of > Unicode characters. No, it's the *encoding* (conversion from characters to bytes) that should be done as UTF-8, not the *decoding* (conversion from bytes to characters). The URIs within the document are decoded along with the rest of the document. But to send them back to the server they need to be encoded (converted from characters back to bytes) and then percent-escaped. If we say they're IRIs then the encoding step is always encoding to UTF-8. But the traditional behavior for URIs has been to encode based on the encoding of the document, which requires tracking, for every URI, what the encoding of the document, style sheet, or script that contained it was. (I don't think Mozilla does this for scripts, but we do for style sheets and documents.) -David -- L. David Baron <URL: http://dbaron.org/ > Technical Lead, Layout & CSS, Mozilla Corporation -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20070314/45a33c9a/attachment.pgp>
Received on Wednesday, 14 March 2007 11:57:47 UTC