W3C home > Mailing lists > Public > www-validator@w3.org > April 2005

Re: Charset policy or?

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Sun, 24 Apr 2005 01:47:50 +0200
To: www-validator@w3.org
Message-ID: <426ADEA6.2DA2@xyzzy.claranet.de>

leif halvard silli wrote:
 
> What you said there is merely a tautological statement.

I'm too lazy to quote the corresponding standards, go and
find it in <http://www.w3.org/TR/charmod/>

> processors are, «of course», not required to know  _all_
> IANA encodings.

Yes, my local charset is pc-multilingual-850+euro, and only
one of my four browsers (Lynx) supports it.

> I am not even sure that 'x-mac-roman' and 'macintosh' 
> (from Unicode 1.0 in in 1991) is 100% equal.

If even you are not sure, the validator is lost.  Override
its charset detection and specify a similar known charset.

> «fatal error» and «non-existent character encoding»

Yes, I get the same for pc-multilingual-850+euro and it does
exist, <http://www.iana.org/assignments/charset-reg/IBM00858>

> It would have been more approriate to warn against using
> the x-mac encodings pointing to their status as «private» 
> encodings.

In your case the validator apparently has an idea, and you
think (not sure) that it's a bad idea.  You could propose a
better text for the warning.  For my case I get this:

The detected character encoding was "pc-multilingual-850+euro".
The error was "".

 [Enforcing a surrogate] 
> This is a bit difficult to do «on the fly», for instance for
> an online document or one document you view locally and which
> works perfectly in your browser.

IMHO not really difficult, just use the advanced interface:
<http://validator.w3.org/file-upload.html>

Nice. now my pc-multilingual-850+euro is "tentatively valid":

| Character Encoding Override in effect!
| The detected character encoding "pc-multilingual-850+euro"
| has been suppressed and "macintosh" used instead. 

> I or anyone else, cannot register them there unless we
> change their names ...

Tough.  But in theory it's possible, IBM, MicroSoft, etc.
registered tons of their charsets, Apple could also do it.

No arguments about funny names please, my own OS/2 "thinks"
that the proper name for windows-1252 is 1004, and it also
claims that IBM00858 (= 850+euro) is 850.  It's rather old.

I can live with these oddities.  Minus the one case where I
asked ICU about my "1004" because I didn't know that it's the
wrong name, that was embarassing.

> If any 'x-mac-' encoding could be treated as 'macintosh'

If there's more than one somebody should really register them.
Maybe they could remove the "x-" from these names.  If you're
not sure (and that's perfectly okay) ask Apple to do this for
you.  

OTOH maybe they have very good reasons why they don't register
their charsets, a registered charset _never_ changes.  No more
"let's add the Euro" and similar stunts.

> Thank you for your irony.

No irony intended.  Just the normal options (in addition to
whining):  If you don't like it change it or leave it.  If in
your world something like TLD .local exists, you're entitled
to use it.  But don't ask whois.iana.org about it, don't use
it in mail on the Internet, and so on.

> blindly using the IANA registry as guide to the WWW

Sorry, but that's definitely not the case, I use this registry
intentionally.  I don't expect that the validator handles URL
<about:mozilla> and I don't use "blink" in public documents.

                             Bye, Frank
 <URL:about:mozilla>
| And the beast shall come forth surrounded by a roiling cloud
| of vengeance.  The house of the unbelievers shall be razed
| and they shall be scorched to the earth. Their tags shall
| blink until the end of days.
|                             from The Book of Mozilla, 12:10
Received on Saturday, 23 April 2005 23:51:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:18 GMT