W3C home > Mailing lists > Public > www-validator@w3.org > January 2002

Re: Macintosh charset blowing up?

From: Nick Kew <nick@webthing.com>
Date: Mon, 7 Jan 2002 21:43:02 +0000 (GMT)
To: Martin Duerst <duerst@w3.org>
cc: Terje Bless <link@pobox.com>, <www-validator@w3.org>, Gerald Oskoboiny <gerald@w3.org>, <nhtcapri@rrzn-user.uni-hannover.de>
Message-ID: <20020107180738.H3191-100000@fenris.webthing.com>

On Sun, 6 Jan 2002, Martin Duerst wrote:

> >I suggest a quick-hack fix for this, that I've added to Page Valet:
> >
> >if ( charset matches /^mac(intosh|roman)/i ) {
> >   message("charset not supported; treating it as UTF-8") ;
> >   charset = "UTF-8" ;
> >}
> It seems that most of the characters are supported;
> would be a pity to give up completely.
> Also, treating something as UTF-8 while it's clearly not
> is a really bad idea.

OK, that would probably be a bad idea for the W3C validator.
OTOH, printing a warning message "charset not correctly supported"
would seem like a good idea.

In the case of Page Valet, I needed a more drastic measure, because
the symptom of the problem was that OpenSP generated broken XML
(an opening "<" was eaten up by the null byte).  But yes, I'll
be looking for a better fix - perhaps

if ( charset is macintosh ) {
  entify the offending bytes ; // accept a performance hit :-(

BTW, treating it as UTF-8 and emitting a warning is also fallback
behaviour when iconv fails due to an explicitly unsupported charset.
Probably not good, but I'm not sure how best to deal with it.
For 8-bit charsets, wholesale entification would be an option,
but how does one know if an unknown charset is 8-bit?

Nick Kew

Site Valet - the mark of Quality on the Web.
Received on Monday, 7 January 2002 16:46:26 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:32 UTC