- From: <bugzilla@jessica.w3.org>
- Date: Sun, 27 Oct 2013 02:24:20 +0000
- To: www-international@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=23646 Addison Phillips <addison@lab126.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |addison@lab126.com --- Comment #1 from Addison Phillips <addison@lab126.com> --- US-ASCII the 7-bit encoding certainly is distinct from windows-1252. However, the Encoding spec treats it as an alias for windows-1252 for the same reason it treats ISO 8859-1 as an alias for windows-1252. In both cases, windows-1252 is a true superset of the specified encoding. When you are decoding a byte sequence in one of these encodings and encounter a byte that US-ASCII or ISO 8859-1 treats as unassigned but which is assigned in windows-1252, it is highly likely that the byte sequence actually uses the windows-1252 encoding. The alternative (keeping these other encodings distinct) would result in additional replacement characters being generated in both the decoding and encoding directions. This is generally best practice on the Web, although the Encoding spec could be a bit more verbose in spelling this out. This is, incidentally, one of the early draft's of HTML5's "willful violations", in this case of the W3C Character Model, which forbids this sort of renaming. While I tend to agree that software generally should use the encoding I specify and accept no substitutes, in practice this turns out to be a better choice. -- You are receiving this mail because: You are on the CC list for the bug.
Received on Sunday, 27 October 2013 02:24:22 UTC