W3C home > Mailing lists > Public > public-html@w3.org > May 2008

Re: Dangers of non-UTF-8 Re: Details on internal encoding declarations

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 23 May 2008 14:02:45 +0300
Cc: Ian Hickson <ian@hixie.ch>, HTML WG <public-html@w3.org>
Message-Id: <E2A67ADE-FE1A-4AE8-8DE3-D5EE9A745572@iki.fi>
To: Alexey Proskuryakov <ap@webkit.org>

On May 23, 2008, at 13:49, Alexey Proskuryakov wrote:

> On May 23, 2008, at 1:15 PM, Henri Sivonen wrote:
>
>> Note: When the document is not encoded as UTF-8, IRIs are not  
>> converted to URIs properly and to data loss happens in form  
>> submissions when the user enters characters that cannot be mapped  
>> to bytes using the encoding of the document.
>
>
> FWIW, Firefox and Safari (not sure about IE) encode form data using  
> numeric entities in this case, so data loss doesn't happen.


I am aware of this. The server cannot know if the user typed a  
character or a string that looks like an NCR, so I think that is  
dataloss in the strict sense.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 23 May 2008 11:03:30 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:55 UTC