- From: NARUSE, Yui <naruse@airemix.jp>
- Date: Sun, 4 Aug 2013 02:19:22 +0900
- To: Ian Hickson <ian@hixie.ch>
- Cc: whatwg <whatwg@lists.whatwg.org>, Martin Janecke <whatwg.org@prlbr.com>
2013/8/1 Ian Hickson <ian@hixie.ch>: > On Thu, 1 Aug 2013, Martin Janecke wrote: >> >> I don't see any sense in making a document that is declared as >> ISO-8859-1 and encoded as ISO-8859-1 non-conforming. Just because the >> ISO-8859-1 code points are a subset of windows-1252? So is US-ASCII. >> Should an US-ASCII declaration also be non-conforming then -- even if >> the document only contains bytes from the US-ASCII range? What's the >> benefit? >> >> I assume this is supposed to be helpful in some way, but to me it just >> seems wrong and confusing. > > If you avoid the bytes that are different in ISO-8859-1 and Win1252, the > spec now allows you to use either label. (As well as "cp1252", "cp819", > "ibm819", "l1", "latin1", "x-cp1252", etc.) > > The part that I find problematic is that if you use use byte 0x85 from > Windows 1252 (U+2026 "…" HORIZONTAL ELLIPSIS), and then label the document > as "ansi_x3.4-1968", "ascii", "iso-8859-1", "iso-ir-100", "iso8859-1", > "iso_8859-1:1987", "us-ascii", or a number of other options, it'll still > be valid, and it'll work exactly as if you'd labeled it "windows-1252". > This despite the fact that in ASCII and in ISO-8859-1, byte 0x85 does not > hap to U+2026. It maps to U+0085 in 8859-1, and it is undefined in ASCII > (since ASCII is a 7 bit encoding). ISO-8859-1 vs. Windows-1252 issue sounds little issue because 0x85 is Next Line. As far as I know 0x85/U+0085 is used only in some IBM system. For Japanese encoding, there's Shift_JIS vs. Windows-31J issue, which people long annoyed. Windows-31J has many new characters which aren't included in Shift_JIS, and many different Unicode mappings from Shift_JIS. But many existing Web pages specify "Shift_JIS" and uses characters only in Windows-31J. Therefore if people want to specify a document as truly Shift_JIS, there's no way on the existing framework. It needs a new way for example a new meta specifier like <META i-want-to-truly-specify-charset-as="Shift_JIS"> and browser recognize the document's encoding as true Shift_JIS. But such people should use UTF-8 instead of introducing such new one. -- NARUSE, Yui <naruse@airemix.jp>
Received on Saturday, 3 August 2013 17:20:28 UTC