- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 1 Aug 2013 01:41:56 +0000 (UTC)
- To: Martin Janecke <whatwg.org@prlbr.com>
- Cc: whatwg <whatwg@lists.whatwg.org>
- Message-ID: <alpine.DEB.2.00.1308010131550.27623@ps20323.dreamhostps.com>
On Thu, 1 Aug 2013, Martin Janecke wrote: > > I don't see any sense in making a document that is declared as > ISO-8859-1 and encoded as ISO-8859-1 non-conforming. Just because the > ISO-8859-1 code points are a subset of windows-1252? So is US-ASCII. > Should an US-ASCII declaration also be non-conforming then -- even if > the document only contains bytes from the US-ASCII range? What's the > benefit? > > I assume this is supposed to be helpful in some way, but to me it just > seems wrong and confusing. If you avoid the bytes that are different in ISO-8859-1 and Win1252, the spec now allows you to use either label. (As well as "cp1252", "cp819", "ibm819", "l1", "latin1", "x-cp1252", etc.) The part that I find problematic is that if you use use byte 0x85 from Windows 1252 (U+2026 "…" HORIZONTAL ELLIPSIS), and then label the document as "ansi_x3.4-1968", "ascii", "iso-8859-1", "iso-ir-100", "iso8859-1", "iso_8859-1:1987", "us-ascii", or a number of other options, it'll still be valid, and it'll work exactly as if you'd labeled it "windows-1252". This despite the fact that in ASCII and in ISO-8859-1, byte 0x85 does not hap to U+2026. It maps to U+0085 in 8859-1, and it is undefined in ASCII (since ASCII is a 7 bit encoding). -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 1 August 2013 01:42:21 UTC