Re: Macintosh charset blowing up?

At 21:43 02/01/07 +0000, Nick Kew wrote:

>On Sun, 6 Jan 2002, Martin Duerst wrote:
>
> > >I suggest a quick-hack fix for this, that I've added to Page Valet:
> > >
> > >if ( charset matches /^mac(intosh|roman)/i ) {
> > >   message("charset not supported; treating it as UTF-8") ;
> > >   charset = "UTF-8" ;
> > >}
> >
> > It seems that most of the characters are supported;
> > would be a pity to give up completely.
> >
> > Also, treating something as UTF-8 while it's clearly not
> > is a really bad idea.
>
>OK, that would probably be a bad idea for the W3C validator.
>OTOH, printing a warning message "charset not correctly supported"
>would seem like a good idea.

I did that, please see 
http://validator.w3.org:8188/check?uri=http%3A%2F%2Fwww.unics.uni-hannover.d 
e%2Fnhtcapri%2Ftest.htm&charset=%28detect+automatically%29&doctype=%28detect 
+automatically%29&ss=


>In the case of Page Valet, I needed a more drastic measure, because
>the symptom of the problem was that OpenSP

About OpenSP: Do you know when it will support characters beyond
plane 0?


>if ( charset is macintosh ) {
>   entify the offending bytes ; // accept a performance hit :-(
>}
>
>BTW, treating it as UTF-8 and emitting a warning is also fallback
>behaviour when iconv fails due to an explicitly unsupported charset.
>Probably not good, but I'm not sure how best to deal with it.
>For 8-bit charsets, wholesale entification would be an option,
>but how does one know if an unknown charset is 8-bit?

I have no idea how entifying an unknown charset would help.
Numeric character entities are all in terms of Unicode, and
if you don't know the charset, there is no chance to get
the right entity numbers.


Regards,    Martin.

Received on Tuesday, 8 January 2002 03:51:29 UTC