W3C home > Mailing lists > Public > www-international@w3.org > October to December 2013

[Bug 23646] "us-ascii" should not be an alias for "windows-1252"

From: <bugzilla@jessica.w3.org>
Date: Sun, 27 Oct 2013 02:24:20 +0000
To: www-international@w3.org
Message-ID: <bug-23646-4285-R5UdGcZqKW@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646

Addison Phillips <addison@lab126.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |addison@lab126.com

--- Comment #1 from Addison Phillips <addison@lab126.com> ---
US-ASCII the 7-bit encoding certainly is distinct from windows-1252. However,
the Encoding spec treats it as an alias for windows-1252 for the same reason it
treats ISO 8859-1 as an alias for windows-1252. In both cases, windows-1252 is
a true superset of the specified encoding. When you are decoding a byte
sequence in one of these encodings and encounter a byte that US-ASCII or ISO
8859-1 treats as unassigned but which is assigned in windows-1252, it is highly
likely that the byte sequence actually uses the windows-1252 encoding. 

The alternative (keeping these other encodings distinct) would result in
additional replacement characters being generated in both the decoding and
encoding directions. This is generally best practice on the Web, although the
Encoding spec could be a bit more verbose in spelling this out.

This is, incidentally, one of the early draft's of HTML5's "willful
violations", in this case of the W3C Character Model, which forbids this sort
of renaming. While I tend to agree that software generally should use the
encoding I specify and accept no substitutes, in practice this turns out to be
a better choice.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Sunday, 27 October 2013 02:24:22 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:35 UTC