- From: Nick Kew <nick@webthing.com>
- Date: Mon, 7 Jan 2002 21:43:02 +0000 (GMT)
- To: Martin Duerst <duerst@w3.org>
- cc: Terje Bless <link@pobox.com>, <www-validator@w3.org>, Gerald Oskoboiny <gerald@w3.org>, <nhtcapri@rrzn-user.uni-hannover.de>
On Sun, 6 Jan 2002, Martin Duerst wrote:
> >I suggest a quick-hack fix for this, that I've added to Page Valet:
> >
> >if ( charset matches /^mac(intosh|roman)/i ) {
> > message("charset not supported; treating it as UTF-8") ;
> > charset = "UTF-8" ;
> >}
>
> It seems that most of the characters are supported;
> would be a pity to give up completely.
>
> Also, treating something as UTF-8 while it's clearly not
> is a really bad idea.
OK, that would probably be a bad idea for the W3C validator.
OTOH, printing a warning message "charset not correctly supported"
would seem like a good idea.
In the case of Page Valet, I needed a more drastic measure, because
the symptom of the problem was that OpenSP generated broken XML
(an opening "<" was eaten up by the null byte). But yes, I'll
be looking for a better fix - perhaps
if ( charset is macintosh ) {
entify the offending bytes ; // accept a performance hit :-(
}
BTW, treating it as UTF-8 and emitting a warning is also fallback
behaviour when iconv fails due to an explicitly unsupported charset.
Probably not good, but I'm not sure how best to deal with it.
For 8-bit charsets, wholesale entification would be an option,
but how does one know if an unknown charset is 8-bit?
--
Nick Kew
Site Valet - the mark of Quality on the Web.
<URL:http://valet.webthing.com/>
Received on Monday, 7 January 2002 16:46:26 UTC