- From: Nick Kew <nick@webthing.com>
- Date: Mon, 7 Jan 2002 21:43:02 +0000 (GMT)
- To: Martin Duerst <duerst@w3.org>
- cc: Terje Bless <link@pobox.com>, <www-validator@w3.org>, Gerald Oskoboiny <gerald@w3.org>, <nhtcapri@rrzn-user.uni-hannover.de>
On Sun, 6 Jan 2002, Martin Duerst wrote: > >I suggest a quick-hack fix for this, that I've added to Page Valet: > > > >if ( charset matches /^mac(intosh|roman)/i ) { > > message("charset not supported; treating it as UTF-8") ; > > charset = "UTF-8" ; > >} > > It seems that most of the characters are supported; > would be a pity to give up completely. > > Also, treating something as UTF-8 while it's clearly not > is a really bad idea. OK, that would probably be a bad idea for the W3C validator. OTOH, printing a warning message "charset not correctly supported" would seem like a good idea. In the case of Page Valet, I needed a more drastic measure, because the symptom of the problem was that OpenSP generated broken XML (an opening "<" was eaten up by the null byte). But yes, I'll be looking for a better fix - perhaps if ( charset is macintosh ) { entify the offending bytes ; // accept a performance hit :-( } BTW, treating it as UTF-8 and emitting a warning is also fallback behaviour when iconv fails due to an explicitly unsupported charset. Probably not good, but I'm not sure how best to deal with it. For 8-bit charsets, wholesale entification would be an option, but how does one know if an unknown charset is 8-bit? -- Nick Kew Site Valet - the mark of Quality on the Web. <URL:http://valet.webthing.com/>
Received on Monday, 7 January 2002 16:46:26 UTC