Dangers of non-UTF-8 Re: Details on internal encoding declarations

On May 23, 2008, at 01:29, Ian Hickson wrote:

>> [...]
>>> Authors are encouraged to use UTF-8. Conformance checkers may advise
>>> against authors using legacy encodings.
>> It might be good to have a note about the badness with form  
>> submission
>> and not-quite-IRI processing that non-UTF-8 encodings cause.
> I'm not really sure what such a note would consist of. Could you  
> send a
> separate e-mail on this topic?

Note: When the document is not encoded as UTF-8, IRIs are not  
converted to URIs properly and to data loss happens in form  
submissions when the user enters characters that cannot be mapped to  
bytes using the encoding of the document. Thus, it is safe to use non- 
UTF-8 encodings only if the document uses only IRIs and doesn't have  
form fields that take user-entered text.

Henri Sivonen

Received on Friday, 23 May 2008 09:16:03 UTC