Re: Updating the IRI spec to include "web addresses"

On Wed, 03 Jun 2009 18:47:05 +0200, Bjoern Hoehrmann <derhoermi@gmx.net>  
wrote:
> * Anne van Kesteren wrote:
>> I think that would be a bug in the definition of the relevant character
>> encoding. For each encoding it should be unambiguous how it maps to
>> Unicode and how Unicode maps to the encoding, IMO.
>
> Many character encodings are not injective, i.e., they permit multiple
> binary representations of the same character sequence, UTF-7 is a well
> known example; the original definition of UTF-8 is another (and it was
> made injective in later versions).

UTF-7 support seems to be being phased out.

For such encodings it would make sense if it was defined which of the  
various "byte stream serializations" of that encoding implementations have  
to use when mapping Unicode to that encoding.


> I note that Internet Explorer won't
> use many problematic character encodings when constructing the query
> string for 'http' and 'https' resource identifiers (for other schemes
> this encoding sensitive treatment of the query string is fictional).

What happens when an encoding is on this blacklist? Do you have pointers  
to tests or other data that backs this up? Would certainly be useful.


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Wednesday, 3 June 2009 19:59:00 UTC