- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 16 Mar 2008 17:08:04 +0200
- To: HTML WG <public-html@w3.org>
> 17:30 < hsivonen> I regret suggesting that 0x80-0x9F bytes be errors > when > ISO-8859-1 is declared > 17:32 < annevk> i don't think it makes much sense > 17:32 < annevk> iso-8859-1 is just an another alias > 17:33 < annevk> just need IANA to fix the registry :) > 17:33 < hsivonen> It'll probably create a huge amount of error noise > 17:34 < hsivonen> Hixie: can I just say I was wrong and ask this > detail to be > reversed? (especially since doing the consistent > thing with > GBK would be a PITA) > 05:37 < Hixie> hsivonen: and you can just display something next to > your line > reporting your encoding, e.g. "Encoding: Windows-1252 > (but > incorrectly labelled as ISO-8859-1)" and "Valid HTML5 > except for > _encoding errors_" where the "encoding errors" link > shows some > more details > 05:54 < Hixie> the encoding issue is easy for me to change in the > spec, but the > hard work would be all yours in implementation, so > let me know > if you want me to relax that (by, again, e-mailing > the list) I know I previously suggested making 0x80-0x9F bytes be errors when ISO-8859-1 is declared. I think my previous suggestion was a mistake. Since then, my views of what HTML5 conformance checking should be like have become even more user-oriented and less Charmod-oriented. ISO-8859-1 is consistently treated as an alias of Windows-1252 in HTML browsers. Therefore, using 0x80-0x9F is reliable in practice. Since it is reliable, having a conformance checker whine about those bytes would not really help anyone and would devalue conformance checking messages. As for putting encoding errors behind a link, if some errors are so useless that they can be hidden behind a link, why have them as errors at all? Besides, it would be silly to have to develop more complex UI and more complex decoder mechanics for errors that will be hidden from view. Although it would be tempting to give an error about ISO-8859-1 not being the preferred IANA name for Windows-1252, I think the right way is to emit a warning saying that ISO-8859-1 is treated as Windows-1252. (Same for the Thai and Simplified Chinese encodings as appropriate.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 16 March 2008 15:08:42 UTC