- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 5 Jun 2007 10:18:49 +0300
On May 29, 2007, at 13:13, Henri Sivonen wrote: > To avoid stepping on the toes of Charmod more than is necessary, I > suggest making it non-conforming for a document to have bytes in > the 0x80?0x9F range when the character encoding is declared to be > one of the ISO-8859 family encodings. I've been thinking about this. I have a proposal on how to spec this *conceptually* and how to implement this with error reporting. I am assuming here that 1) No one ever intends C1 code points to be present in the decoded stream and 2) we want, as a Charmod correctness fig leaf, to make the C1 bytes non-conforming when ISO-8859-1 or ISO-8859-11 was declared but Windows-1252 or Windows-874 decoding is needed. Based on the behavior of Minefield and Opera 9.20, the following seems to be the least Charmod violating and least quirky approach that could possibly work: 1) Decode the byte stream using a decoder for whatever encoding was declared, even ISO-8859-1 or ISO-8859-11, according to ftp:// ftp.unicode.org/Public/MAPPINGS/. 2) If a character in the decoded character stream is in the C1 code point range, this is a document conformance violation. 2a) If the declared encoding was ISO-8859-1, replace that character with the character that you get by casting the code point into a byte and decoding it as Windows-1252. 2b) If the declared encoding was ISO-8859-11, replace that character with the character that you get by casting the code point into a byte and decoding it as Windows-874. [ The *simplest* and most robust (and maximally Charmod-violating) thing would be: 1) Decode the byte stream using a decoder for whatever encoding was declared, even ISO-8859-1 or ISO-8859-11, according to ftp:// ftp.unicode.org/Public/MAPPINGS/. 2) If a character in the decoded character stream is in the C1 code point range, this is a document conformance violation. Replace that character with the character that you get by casting the code point into a byte and decoding it as Windows-1252. But this isn't what Minefield, Opera 9.20 and WebKit nightlies do. ] -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 5 June 2007 00:18:49 UTC