Encoding

I'm still struggling with the goals of the encoding work.  https://encoding.spec.whatwg.org/ 

Everything except UTF-8 is legacy, which is good, and I get a desire to quantify the landscape, however I'm not sure what point is served by standardizing the tables.

Either A) Existing content is already correct per an existing standard (in which case a link would suffice), or B) Existing content was encoded using slightly different tables.

In the case of existing content, it probably "works" for whomever's using it, though there may be interoperability issues.  To correct that data, they need to move to UTF-8.  Adding yet another "perfect" mapping table only causes further fragmentation as people may attempt to convert to that.

For example, HKSCS is rolled up to big-5, however historically there have been multiple font-hack PUA and real Unicode code point assignments for that space.  Which makes it hard to say that one mapping or another is "right" for that space.  It likely depends on actual data, how the application uses it, and what it's dependencies are.  Worse, I can't even reliably detect the quirks of the system where data originated as it may be currently hosted on some other platform.

Currently different vendors/platforms/systems have slightly different mappings.  Clearly that isn't desirable, however a "standard" would obviously break existing data for at least some of those vendors/platforms/systems. 

So, what does the WG expect to happen from this process?

A) Do they expect users to correct data to the WG standard mappings?
B) Do they expect applications (or users) to abandon previous behavior to the WG standard mappings?
C) For either of these, what timeframe does the WG expect it to happen in?
D) Does the WG expect that this problem will be "solved" as a result of this work.  (Solved == everything's codified so there is no more confusion?)

Thanks,

-Shawn

Received on Tuesday, 24 February 2015 21:01:49 UTC