Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder from Joshua Bell on 2012-08-13 (public-whatwg-archive@w3.org from August 2012)

From: Joshua Bell <jsbell@chromium.org>
Date: Mon, 13 Aug 2012 09:08:24 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: whatwg@lists.whatwg.org
Message-ID: <CAD649j7AN7Ax69ALngDm9qCH_T+kd2omK_qrAaG8kHQzw29jMQ@mail.gmail.com>

Sorry if this is a dupe; I replied to this from my phone and an incorrect
address, and my earlier reply isn't showing in the archives.

On Fri, Aug 10, 2012 at 9:16 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> The spec now contains the following text:
>
> "NOTE: Because only UTF encodings are supported, and because of the
> algorithm used to convert a DOMString to a sequence of Unicode
> characters, no input can cause the encoding process to emit an encoder
> error."
>
> This is not correct. A DOMString is not a sequence of Unicode
> characters, it's a UTF16 encoded string (this is per EcmaScript). Thus
> it can contain unpaired surrogates and so the encoding process can
> result in encoder errors.
>
> As I've suggested earlier, I think we should deal with this by simply
> emitting Unicode replacement characters for these encoder errors (i.e.
> for unpaired surrogates).
>

Already accounted for. Note the phrase:

and because of the algorithm used to convert a DOMString to a sequence of
> Unicode characters

This refers to the normative text that generates a sequence of Unicode code
points from a DOMString by reference to the algorithm in WebIDL [1], which
handles unpaired surrogates etc.

This informative text should say "Unicode code points" rather than "Unicode
characters", though. Fixing now and referenced [1] even in the note.

[1] http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode

Received on Monday, 13 August 2012 16:08:55 UTC