W3C home > Mailing lists > Public > public-webapps@w3.org > January to March 2012

Re: [FileAPI, common] UTF-16 to UTF-8 conversion

From: Simon Pieters <simonp@opera.com>
Date: Tue, 28 Feb 2012 07:11:06 +0100
To: "Arun Ranganathan" <aranganathan@mozilla.com>, "Glenn Maynard" <glenn@zewt.org>
Cc: "Eric U" <ericu@google.com>, public-webapps@w3.org, "Jonas Sicking" <jonas@sicking.cc>
Message-ID: <op.wadksstwidj3kv@simons-macbook-pro.local>
On Tue, 28 Feb 2012 01:05:44 +0100, Glenn Maynard <glenn@zewt.org> wrote:

> On Mon, Feb 27, 2012 at 5:34 PM, Arun Ranganathan
> <aranganathan@mozilla.com>wrote:
>
>> Simon,
>>
>> Is the relevant part of HTML sufficient to refer to?
>> http://dev.w3.org/html5/spec/Overview.html#utf-8

I was thinking of "If the data argument has any unpaired surrogates, then  
throw a SyntaxError exception.".  
http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#dom-websocket-send

>
> That defines decoding UTF-8 to Unicode strings.  You need the reverse.
>
> Using a replacement scheme like UTF-8 decoding, instead of a hard
> exception, seems more consistent with how encodings in general are
> handled.  Otherwise, you'll end up with bugs in code if, for example,
> people paste in unpaired surrogates (Firefox allows this, last I  
> checked),

Maybe unpaired surrogates should be converted to U+FFFD on paste. Are  
there other cases?

> causing unexpected exceptions in code.  Instead, just convert them to
> U+FFFD, which gives much more graceful error handling for such a rare  
> case
> that most people will never handle explicitly.

If we can't U+FFFD unpaired surrogates on paste, I agree it makes sense to  
U+FFFD them in APIs. If the only way to get them is a JS escape, then an  
exception seems OK.

> I think WebSocket should do the same, for the same reason.

Have you filed a bug?

-- 
Simon Pieters
Opera Software
Received on Tuesday, 28 February 2012 06:11:43 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:50 GMT