Re: Proposal to deprecate 'Character encodings' article from 신정식 on 2016-01-25 (www-international@w3.org from January to March 2016)

From: 신정식 <jshin1987+w3@gmail.com>
Date: Mon, 25 Jan 2016 11:38:17 -0800
To: r12a <ishida@w3.org>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, John C Klensin <john+w3c@jck.com>, www International <www-international@w3.org>
Message-ID: <CAE1ONj9UXNFeWo6XNhY2nNtyeA6i5=xbZ-0w4O463bE6aszzfg@mail.gmail.com>

On Mon, Jan 25, 2016 at 8:56 AM, <ishida@w3.org> wrote:

>
> It may be useful to note, wrt the first, that we advise HTML content
> authors to check the list in the Encoding spec because it "provides a list
> that has been tested against actual browser implementations". For Web
> platform development, this is therefore the most useful list to choose
> from, since it take into account interoperability in browsers. We do,
> however, also mention the IANA registry. (See
> https://www.w3.org/International/questions/qa-choosing-encodings#nonutf8)
>

Richard, the article you pointed to has the following about PUA:

 The *x-user-defined* encoding is a single-byte encoding whose lower half
is ASCII and whose upper half is mapped into the Unicode Private Use Area
(PUA). Like the PUA in general, using this encoding on the public Internet
is best avoided because it damages interoperability and long-term use.

What do you think of adding a similar warning about PUA and Shift_JIS and
GB18030? I'm rather disappointed that GB 18030 2005 still has a lot of PUA
code points (after converting to Unicode) even though there are regular
Unicode code points available. See
https://github.com/whatwg/encoding/issues/22 (and
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c1 : note that my
comment about remapping in GB 18030:2005 turned out to be incorrect. Only
one PUA codepoint was remapped to a regular code point between 2000 and
2005).

Received on Monday, 25 January 2016 19:38:45 UTC