RE: Encoding Standard (was: RE: Encoding API exceptions) from Shawn Steele on 2014-11-10 (www-international@w3.org from October to December 2014)

From: Shawn Steele <Shawn.Steele@microsoft.com>
Date: Mon, 10 Nov 2014 17:48:35 +0000
To: Anne van Kesteren <annevk@annevk.nl>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <afda1f88c403418caec4d71d51163b30@CY1PR0301MB0731.namprd03.prod.outlook.com>

Ok, more bluntly, if someone notices a discrepancy between https://encoding.spec.whatwg.org/index-windows-1252.txt vs http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT, then what happens?

I'm not saying there is such a discrepancy, however if there isn't, then why not point to the version IANA points to in the charsets registry?  If there were a discrepancy (other than I suppose undefined being marked as control), then which one gets the bug?

-Shawn

-----Original Message-----
From: annevankesteren@gmail.com [mailto:annevankesteren@gmail.com] On Behalf Of Anne van Kesteren
Sent: Monday, November 10, 2014 1:26 AM
To: Shawn Steele
Cc: www-international@w3.org
Subject: Re: Encoding Standard (was: RE: Encoding API exceptions)

On Sun, Nov 9, 2014 at 9:44 PM, Shawn Steele <Shawn.Steele@microsoft.com> wrote:
> Generally the content is created with text editors, from data stores, etc, that came from other systems, and not specifically for the web.

I.e. mostly Windows, though some IBM and NEC, and of course gb18030 (except for one double byte sequence as indicated). Turns out that browsers on e.g. Mac and Linux felt the pressure to not just support encodings from the host OS, but also from Windows. And then over time some cleanup happened and the Encoding Standard is the result of what we think is needed to support the web.

>  I'm unaware of systems that convert from shift-jis to shift-jis for example.

I'm not sure I follow this example.

> In other words, if the definitions are incompatible with the behavior on the host OS (or wherever the data comes from), then there're likely to be corruptions.

On the web, the data can come from anywhere. The host OS is not relevant as that can change over time.

> The solution is, of course, to use Unicode.

Quite.

--
https://annevankesteren.nl/

Received on Monday, 10 November 2014 17:49:05 UTC