- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 03 Jun 2009 21:58:04 +0200
- To: "Bjoern Hoehrmann" <derhoermi@gmx.net>
- Cc: "public-iri@w3.org" <public-iri@w3.org>
On Wed, 03 Jun 2009 18:47:05 +0200, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > * Anne van Kesteren wrote: >> I think that would be a bug in the definition of the relevant character >> encoding. For each encoding it should be unambiguous how it maps to >> Unicode and how Unicode maps to the encoding, IMO. > > Many character encodings are not injective, i.e., they permit multiple > binary representations of the same character sequence, UTF-7 is a well > known example; the original definition of UTF-8 is another (and it was > made injective in later versions). UTF-7 support seems to be being phased out. For such encodings it would make sense if it was defined which of the various "byte stream serializations" of that encoding implementations have to use when mapping Unicode to that encoding. > I note that Internet Explorer won't > use many problematic character encodings when constructing the query > string for 'http' and 'https' resource identifiers (for other schemes > this encoding sensitive treatment of the query string is fictional). What happens when an encoding is on this blacklist? Do you have pointers to tests or other data that backs this up? Would certainly be useful. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 3 June 2009 19:59:00 UTC