W3C home > Mailing lists > Public > www-international@w3.org > October to December 2014

Re: Encoding Standard (was: RE: Encoding API exceptions)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Mon, 10 Nov 2014 19:40:15 +0100
Message-ID: <CADnb78hE22UTHo0iV9Yj5MAuHsxXFY6ynyhMyoZ5wLyc8K_Rgw@mail.gmail.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>
Cc: "www-international@w3.org" <www-international@w3.org>
On Mon, Nov 10, 2014 at 6:48 PM, Shawn Steele
<Shawn.Steele@microsoft.com> wrote:
> Ok, more bluntly, if someone notices a discrepancy between https://encoding.spec.whatwg.org/index-windows-1252.txt vs http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT, then what happens?

You mean how the latter has UNDEFINED for various bytes whereas the
former requires U+FFFD?

If a format references the Encoding Standard, it's clear what needs to
happen. If a format is vague about encodings, that would be a problem
that needs to be fixed.

> I'm not saying there is such a discrepancy, however if there isn't, then why not point to the version IANA points to in the charsets registry?  If there were a discrepancy (other than I suppose undefined being marked as control), then which one gets the bug?

The charsets registry is hopeless out of touch with reality. That has
been pointed out on the relevant list, but the discussion went nowhere
(as you know). The "standards" it points to meanwhile do not address
issues implementations face. E.g. it is not defined that in shift_jis
0x81 0x22 needs to become U+FFFD U+0022 as otherwise you'll expose
resources to XSS. Extensions to shift_jis or euc-kr are also not

Received on Monday, 10 November 2014 18:40:42 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:38 UTC