Re: [whatwg] Requiring the Encoding Standard preferred name is too strict for no good reason from Ian Hickson on 2013-08-01 (public-whatwg-archive@w3.org from August 2013)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 1 Aug 2013 01:41:56 +0000 (UTC)
To: Martin Janecke <whatwg.org@prlbr.com>
Cc: whatwg <whatwg@lists.whatwg.org>
Message-ID: <alpine.DEB.2.00.1308010131550.27623@ps20323.dreamhostps.com>

On Thu, 1 Aug 2013, Martin Janecke wrote:
> 
> I don't see any sense in making a document that is declared as 
> ISO-8859-1 and encoded as ISO-8859-1 non-conforming. Just because the 
> ISO-8859-1 code points are a subset of windows-1252? So is US-ASCII. 
> Should an US-ASCII declaration also be non-conforming then -- even if 
> the document only contains bytes from the US-ASCII range? What's the 
> benefit?
> 
> I assume this is supposed to be helpful in some way, but to me it just 
> seems wrong and confusing.

If you avoid the bytes that are different in ISO-8859-1 and Win1252, the 
spec now allows you to use either label. (As well as "cp1252", "cp819", 
"ibm819", "l1", "latin1", "x-cp1252", etc.)

The part that I find problematic is that if you use use byte 0x85 from 
Windows 1252 (U+2026 "…" HORIZONTAL ELLIPSIS), and then label the document 
as "ansi_x3.4-1968", "ascii", "iso-8859-1", "iso-ir-100", "iso8859-1", 
"iso_8859-1:1987", "us-ascii", or a number of other options, it'll still 
be valid, and it'll work exactly as if you'd labeled it "windows-1252". 
This despite the fact that in ASCII and in ISO-8859-1, byte 0x85 does not 
hap to U+2026. It maps to U+0085 in 8859-1, and it is undefined in ASCII 
(since ASCII is a 7 bit encoding).

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 1 August 2013 01:42:21 UTC