Re: [whatwg/encoding] Explain the relationship between windows-1252, Latin1, and ASCII (PR #345) from Anne van Kesteren on 2025-04-11 (public-webapps-github@w3.org from April 2025)

From: Anne van Kesteren <notifications@github.com>
Date: Thu, 10 Apr 2025 23:58:40 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/encoding/pull/345/review/2759325886@github.com>

@annevk commented on this pull request.



> @@ -568,7 +580,10 @@ prescribes, as that is necessary to be compatible with deployed content.
   <tr><td>"<code>windows-1251</code>"
   <tr><td>"<code>x-cp1251</code>"
   <tr>
-   <td rowspan=17><a>windows-1252</a>
+   <td rowspan=17>
+    <a>windows-1252</a>
+    <p class=note>See <a href="#note-latin1-ascii">below</a> for the relationship to historical
+    "Latin1" and "ASCII" concepts.

Can you also add this to the Python script that generates this table?

> @@ -732,6 +747,30 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a
 and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no
 plans to remove these.</p>
 
+<div class=note id=note-latin1-ascii>
+ <p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> like
+ "<code>latin1</code>", "<code>iso-8859-1</code>", "<code>ascii</code>", etc. which have
+ historically been confusing for developers. On the web, and in any software that seeks to be
+ web-compatible by implementing the Encoding Standard, these are synonyms: "<code>latin1</code>" and
+ "<code>ascii</code>" are just labels for <a>windows-1252</a>, and any software following this
+ standard will, for example, decode 0x80 as U+20AC (€) when asked for the Latin1 or ASCII decoding
+ of that byte.

I think overall this is probably okay, but what gives me pause is that the Encoding standard doesn't define Latin1 or ASCII encodings (it only defines them as labels). So if software exposes those encodings, who knows what they might do. So perhaps we should make that distinction clearer, in that this will likely happen for software that takes a label and some bytes as input.

> @@ -732,6 +747,30 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a
 and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no
 plans to remove these.</p>
 
+<div class=note id=note-latin1-ascii>
+ <p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> like
+ "<code>latin1</code>", "<code>iso-8859-1</code>", "<code>ascii</code>", etc. which have
+ historically been confusing for developers. On the web, and in any software that seeks to be
+ web-compatible by implementing the Encoding Standard, these are synonyms: "<code>latin1</code>" and

I vaguely recall we decided to lowercase standard? Do I misremember?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/pull/345#pullrequestreview-2759325886
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/encoding/pull/345/review/2759325886@github.com>

Received on Friday, 11 April 2025 06:58:44 UTC