W3C home > Mailing lists > Public > public-html@w3.org > May 2008

Dangers of non-UTF-8 Re: Details on internal encoding declarations

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 23 May 2008 12:15:21 +0300
Cc: HTML WG <public-html@w3.org>
Message-Id: <FFA1C02B-7105-4DBD-BE69-E93E9F2BA1D6@iki.fi>
To: Ian Hickson <ian@hixie.ch>

On May 23, 2008, at 01:29, Ian Hickson wrote:

>> [...]
>>> Authors are encouraged to use UTF-8. Conformance checkers may advise
>>> against authors using legacy encodings.
>>
>> It might be good to have a note about the badness with form  
>> submission
>> and not-quite-IRI processing that non-UTF-8 encodings cause.
>
> I'm not really sure what such a note would consist of. Could you  
> send a
> separate e-mail on this topic?

Note: When the document is not encoded as UTF-8, IRIs are not  
converted to URIs properly and to data loss happens in form  
submissions when the user enters characters that cannot be mapped to  
bytes using the encoding of the document. Thus, it is safe to use non- 
UTF-8 encodings only if the document uses only IRIs and doesn't have  
form fields that take user-entered text.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 23 May 2008 09:16:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:17 GMT