RE: [Encoding] false statement [I18N-ACTION-328][I18N-ISSUE-374] from John C Klensin on 2014-08-28 (www-international@w3.org from July to September 2014)

From: John C Klensin <john+w3c@jck.com>
Date: Thu, 28 Aug 2014 18:24:53 -0400
To: Andrew Cunningham <lang.support@gmail.com>, Larry Masinter <masinter@adobe.com>
cc: wwwintl <www-international@w3.org>, "Phillips, Addison" <addison@lab126.com>, Richard Ishida <ishida@w3.org>
Message-ID: <9AFB2FEC22E3B957B4B615D6@JcK-HP8200.jck.com>

--On Friday, August 29, 2014 07:03 +1000 Andrew Cunningham
<lang.support@gmail.com> wrote:

>...
> But that boat has already sailed.
>...

Andrew, since you have repeated that several times and I, at
least, have no problem believing that there are things floating
around that encode an at least partially different repertoire
and code point assignment than Standard Unicode, doing using the
UTF-8 encoding algorithm, do you have a constructive suggestion
as to what we _should_ do?  Regardless of the fate of either the
IANA Charset registry or the encoding and labeling
specifications of the Encoding draft, it seems to me that simply
repeating "there is a problem" or "the ship has sailed" doesn't
help us make progress.

FWIW, you have convinced me of something I already suspected,
which is that we need to be more precise about our terminology.
But, when Larry, or anyone making reference to the IANA Charset
registry, says something like "encouraging new applications to
use utf-8" he (or we) mean "Standard Unicode encoded in UTF-8"
not "some random 8, 16, or 32 bit CCS that might or might not
resemble Unicode encoded according to the UTF-8 algorithm".
Given the former, very popular, usage of the term "UTF-8",
calling the second usage "UTF-8" without any qualification just
adds to the confusion I think you are lamenting (and I certainly
am).

     john

Received on Thursday, 28 August 2014 22:25:23 UTC