W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2012

Re: [whatwg] StringEncoding: Allowed encodings for TextEncoder

From: Joshua Bell <jsbell@chromium.org>
Date: Mon, 13 Aug 2012 09:08:24 -0700
Message-ID: <CAD649j7AN7Ax69ALngDm9qCH_T+kd2omK_qrAaG8kHQzw29jMQ@mail.gmail.com>
To: Jonas Sicking <jonas@sicking.cc>
Cc: whatwg@lists.whatwg.org
Sorry if this is a dupe; I replied to this from my phone and an incorrect
address, and my earlier reply isn't showing in the archives.

On Fri, Aug 10, 2012 at 9:16 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> The spec now contains the following text:
>
> "NOTE: Because only UTF encodings are supported, and because of the
> algorithm used to convert a DOMString to a sequence of Unicode
> characters, no input can cause the encoding process to emit an encoder
> error."
>
> This is not correct. A DOMString is not a sequence of Unicode
> characters, it's a UTF16 encoded string (this is per EcmaScript). Thus
> it can contain unpaired surrogates and so the encoding process can
> result in encoder errors.
>
> As I've suggested earlier, I think we should deal with this by simply
> emitting Unicode replacement characters for these encoder errors (i.e.
> for unpaired surrogates).
>

Already accounted for. Note the phrase:

and because of the algorithm used to convert a DOMString to a sequence of
> Unicode characters


This refers to the normative text that generates a sequence of Unicode code
points from a DOMString by reference to the algorithm in WebIDL [1], which
handles unpaired surrogates etc.

This informative text should say "Unicode code points" rather than "Unicode
characters", though. Fixing now and referenced [1] even in the note.

[1] http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode
Received on Monday, 13 August 2012 16:08:55 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:44 UTC