W3C home > Mailing lists > Public > public-webapps@w3.org > January to March 2012

Re: [FileAPI, common] UTF-16 to UTF-8 conversion

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 27 Feb 2012 18:05:44 -0600
Message-ID: <CABirCh9-XYOTY=Pof_5sjKHDBnXkSV4CuV1HmapM6Y38PmtQSA@mail.gmail.com>
To: Arun Ranganathan <aranganathan@mozilla.com>
Cc: Eric U <ericu@google.com>, Simon Pieters <simonp@opera.com>, public-webapps@w3.org, Jonas Sicking <jonas@sicking.cc>
On Mon, Feb 27, 2012 at 5:34 PM, Arun Ranganathan

> Simon,
> Is the relevant part of HTML sufficient to refer to?
> http://dev.w3.org/html5/spec/Overview.html#utf-8

That defines decoding UTF-8 to Unicode strings.  You need the reverse.

Using a replacement scheme like UTF-8 decoding, instead of a hard
exception, seems more consistent with how encodings in general are
handled.  Otherwise, you'll end up with bugs in code if, for example,
people paste in unpaired surrogates (Firefox allows this, last I checked),
causing unexpected exceptions in code.  Instead, just convert them to
U+FFFD, which gives much more graceful error handling for such a rare case
that most people will never handle explicitly.

I think WebSocket should do the same, for the same reason.

Glenn Maynard
Received on Tuesday, 28 February 2012 00:07:03 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 October 2017 07:26:38 UTC