- From: Peter Karlsson <peter@opera.com>
- Date: Wed, 14 Mar 2007 15:20:44 +0100 (CET)
L. David Baron on 2007-03-13: > I tend to think it would be good that new uses of URIs/IRIs document that > they are really IRIs and therefore this reverse-encoding behavior should > not be used, but instead encoding should be done as UTF-8. You cannot have UTF-8 encoding just for the URIs/IRIs, and another encoding for the rest of the source text. To properly parse a URI/IRI in the source document, you must first convert the bytes in the resource into a stream of Unicode characters. > (In Mozilla's codebase such distinctions are easy to implement since > we have to pass along the encoding of the document every time we > create a URI in order to get this backwards-compatible behavior. Of course, you will need to take special care to handle query data that is stored as plain non-ASCII bytes in the source document, so you would still need to pass around that document encoding. > It would probably be good if the spec documented how the encoding > issues in URIs are actually handled. Indeed. Considering the number of partly contradicting bug reports we have here at Opera on the issue, it would be nice to have it clearly spelled out, so that everyone is doing the same thing, and that we are doing what the user expects. -- \\// Peter, software engineer, Opera Software The opinions expressed are my own, and not those of my employer. Please reply only by follow-ups on the mailing list.
Received on Wednesday, 14 March 2007 07:20:44 UTC