Re: [FileAPI, common] UTF-16 to UTF-8 conversion

On Tue, 28 Feb 2012 01:05:44 +0100, Glenn Maynard <glenn@zewt.org> wrote:

> On Mon, Feb 27, 2012 at 5:34 PM, Arun Ranganathan
> <aranganathan@mozilla.com>wrote:
>
>> Simon,
>>
>> Is the relevant part of HTML sufficient to refer to?
>> http://dev.w3.org/html5/spec/Overview.html#utf-8

I was thinking of "If the data argument has any unpaired surrogates, then  
throw a SyntaxError exception.".  
http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#dom-websocket-send

>
> That defines decoding UTF-8 to Unicode strings.  You need the reverse.
>
> Using a replacement scheme like UTF-8 decoding, instead of a hard
> exception, seems more consistent with how encodings in general are
> handled.  Otherwise, you'll end up with bugs in code if, for example,
> people paste in unpaired surrogates (Firefox allows this, last I  
> checked),

Maybe unpaired surrogates should be converted to U+FFFD on paste. Are  
there other cases?

> causing unexpected exceptions in code.  Instead, just convert them to
> U+FFFD, which gives much more graceful error handling for such a rare  
> case
> that most people will never handle explicitly.

If we can't U+FFFD unpaired surrogates on paste, I agree it makes sense to  
U+FFFD them in APIs. If the only way to get them is a JS escape, then an  
exception seems OK.

> I think WebSocket should do the same, for the same reason.

Have you filed a bug?

-- 
Simon Pieters
Opera Software

Received on Tuesday, 28 February 2012 06:11:43 UTC