- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 03 Jun 2009 15:47:44 +0200
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Dan Connolly" <connolly@w3.org>
- Cc: "Larry Masinter" <masinter@adobe.com>, "HTML WG" <public-html@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
On Wed, 03 Jun 2009 13:29:49 +0200, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote: > I was also thinking that it's possible to describe it that way. > Probably Larry has to try out which one works better, based on your > text. One problem with the above definition is that there may be some > edge cases where the two ways of defining things, when followed to the > letter by implementations, would not given exactly the same result. > > The cases to look for here would be cases where the initial conversion > from bytes to characters (implemented customarily these days as a > conversion to Unicode) and the later back-conversion from characters to > bytes for the query part would not roundtrip cleanly. > > I can't just now come up off the top of my head with a case that it > would be worth to try on some browsers, but that doesn't mean there > couldn't be one. Another way of checking this might be to look at some > implementations, or check with the browser people. > > Also, I'm not sure it's relevant, as the abstract description of bytes > and characters may be done without explicitly specifying where encoding > conversions have to take place. I think that would be a bug in the definition of the relevant character encoding. For each encoding it should be unambiguous how it maps to Unicode and how Unicode maps to the encoding, IMO. Also, defining it starting from bytes will not resolve this issue since the implementations affected here always have a character stream at this point, not a byte stream. They do keep the original encoding around. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 3 June 2009 13:48:35 UTC