W3C home > Mailing lists > Public > www-international@w3.org > January to March 2015

Re: Encoding

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Wed, 25 Feb 2015 18:11:45 +0900
Message-ID: <54ED91D1.3000905@it.aoyama.ac.jp>
To: Glenn Adams <glenn@skynav.com>, Shawn Steele <Shawn.Steele@microsoft.com>
CC: "Phillips, Addison" <addison@lab126.com>, "www-international@w3.org" <www-international@w3.org>
I think the author would describe the goal to have a spec for browsers 
so that hopefully/eventually all browsers would treat encodings the 
same, and so that it would be easier to create a new browser, because 
there was less stuff to reverse-engineer.

On top of that, the author may have (had) the hope that the power of the 
(Web) platform would make (t)his spec eliminate all the other specs and 
variants. But there is a high chance that it was just something like 

As for the W3C, the encoding spec was referenced from the HTML5 spec, 
and so there was a need to move the encoding spec forward process-wise 
in order for the HTML5 spec to be able to move forward.

Regards,   Martin.

On 2015/02/25 07:06, Glenn Adams wrote:
> On Tue, Feb 24, 2015 at 2:01 PM, Shawn Steele <Shawn.Steele@microsoft.com>
> wrote:
>> I'm still struggling with the goals of the encoding work.
>> https://encoding.spec.whatwg.org/
> IMO, the reason for this specification is that the author had little
> knowledge of character encoding, and used the exercise of writing a new
> document as a way to acquire that knowledge, and, of course, to rewrite the
> world of encodings in his PoV.
> I suppose the reason the author would give, however, is that it was
> intended to document existing practice or best practice or something in
> between. Again, one questions the authority to do something of that sort
> from one new to the subject.
> That's just my two cents. Do not interpret my comments as an attack on the
> author. I have a lot of respect for him. Just not on this subject.
>> Everything except UTF-8 is legacy, which is good, and I get a desire to
>> quantify the landscape, however I'm not sure what point is served by
>> standardizing the tables.
>> Either A) Existing content is already correct per an existing standard (in
>> which case a link would suffice), or B) Existing content was encoded using
>> slightly different tables.
>> In the case of existing content, it probably "works" for whomever's using
>> it, though there may be interoperability issues.  To correct that data,
>> they need to move to UTF-8.  Adding yet another "perfect" mapping table
>> only causes further fragmentation as people may attempt to convert to that.
>> For example, HKSCS is rolled up to big-5, however historically there have
>> been multiple font-hack PUA and real Unicode code point assignments for
>> that space.  Which makes it hard to say that one mapping or another is
>> "right" for that space.  It likely depends on actual data, how the
>> application uses it, and what it's dependencies are.  Worse, I can't even
>> reliably detect the quirks of the system where data originated as it may be
>> currently hosted on some other platform.
>> Currently different vendors/platforms/systems have slightly different
>> mappings.  Clearly that isn't desirable, however a "standard" would
>> obviously break existing data for at least some of those
>> vendors/platforms/systems.
>> So, what does the WG expect to happen from this process?
>> A) Do they expect users to correct data to the WG standard mappings?
>> B) Do they expect applications (or users) to abandon previous behavior to
>> the WG standard mappings?
>> C) For either of these, what timeframe does the WG expect it to happen in?
>> D) Does the WG expect that this problem will be "solved" as a result of
>> this work.  (Solved == everything's codified so there is no more confusion?)
>> Thanks,
>> -Shawn
Received on Wednesday, 25 February 2015 09:12:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:07 UTC