- From: Anne van Kesteren <notifications@github.com>
- Date: Tue, 15 Apr 2025 02:46:03 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/encoding/pull/345/review/2767568493@github.com>
@annevk commented on this pull request. > + <p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> like + "<code>latin1</code>", "<code>iso-8859-1</code>", "<code>ascii</code>", etc. which have like and etc. seems redundant? Maybe "labels, such as latin1, iso-..., and ascii, which have ..."? > @@ -732,6 +747,30 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no plans to remove these.</p> +<div class=note id=note-latin1-ascii> + <p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> like + "<code>latin1</code>", "<code>iso-8859-1</code>", "<code>ascii</code>", etc. which have + historically been confusing for developers. On the web, and in any software that seeks to be + web-compatible by implementing the Encoding Standard, these are synonyms: "<code>latin1</code>" and + "<code>ascii</code>" are just labels for <a>windows-1252</a>, and any software following this + standard will, for example, decode 0x80 as U+20AC (€) when asked for the Latin1 or ASCII decoding + of that byte. + + <p>Software that does not follow the Encoding Standard does not always give the same answers. The + root of this is that the original document that specified Latin1 (ISO/IEC 8859-1), did not provide + any mappings for bytes in the inclusive ranges 0x00–0x1F or 0x7F–0x9F. Similarly, the original Nit: I think whenever we talk about ranges it's always in the form of "0x7F to 0x9F". > @@ -732,6 +747,30 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no plans to remove these.</p> +<div class=note id=note-latin1-ascii> + <p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> like + "<code>latin1</code>", "<code>iso-8859-1</code>", "<code>ascii</code>", etc. which have + historically been confusing for developers. On the web, and in any software that seeks to be + web-compatible by implementing the Encoding Standard, these are synonyms: "<code>latin1</code>" and + "<code>ascii</code>" are just labels for <a>windows-1252</a>, and any software following this + standard will, for example, decode 0x80 as U+20AC (€) when asked for the Latin1 or ASCII decoding + of that byte. The problem I have with this is that browsers typically have "Latin1" code paths that are very much aligned with the Unicode view of the world and not windows-1252. So for complicated software it very much depends on how or what you ask. I also don't really have a good rephrasing that would account for that. Maybe put Latin1 and ASCII in quotes like below? > @@ -732,6 +747,30 @@ part of the ISO 8859 series. In particular, the necessity of the inclusion of <a and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no plans to remove these.</p> +<div class=note id=note-latin1-ascii> + <p>The <a>windows-1252</a> <a for=/>encoding</a> has various <a for=encoding>labels</a> like + "<code>latin1</code>", "<code>iso-8859-1</code>", "<code>ascii</code>", etc. which have + historically been confusing for developers. On the web, and in any software that seeks to be + web-compatible by implementing the Encoding Standard, these are synonyms: "<code>latin1</code>" and How about "this standard"? -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/pull/345#pullrequestreview-2767568493 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/encoding/pull/345/review/2767568493@github.com>
Received on Tuesday, 15 April 2025 09:46:07 UTC