W3C home > Mailing lists > Public > www-validator@w3.org > April 2005

Re: Charset policy or?

From: leif halvard silli <hyperlekken@lenk.no>
Date: Sun, 24 Apr 2005 03:36:51 +0200
Message-ID: <426AF833.30301@lenk.no>
To: Frank Ellermann <nobody@xyzzy.claranet.de>
CC: www-validator@w3.org

Frank Ellermann wrote:

>leif halvard silli wrote:
> 
>  
>
>>What you said there is merely a tautological statement.
>>    
>>
>
>I'm too lazy to quote the corresponding standards, go and
>find it in <http://www.w3.org/TR/charmod/>
>  
>

This is what that document says about x- encodings:

[S]  [I]  [C]  If an unregistered character encoding is used, the 
convention of using 'x-' at the beginning of the name MUST be followed.

<http://www.w3.org/TR/charmod/#C023>

It is not written anywhere that Validators, browsers or whatever should 
_not_ accept x- encodings.

>>I am not even sure that 'x-mac-roman' and 'macintosh' 
>>(from Unicode 1.0 in in 1991) is 100% equal.
>>    
>>
>
>If even you are not sure, the validator is lost. 
>

Ah, I was waiting for you to say that. May be I am the authority on this 
issue?

> Override its charset detection and specify a similar known charset.
>  
>

Overriding may be used, it is a useful tip, if nothing better can be 
offered. Perhaps the Validator should inform use that we may do so? Why 
shall the users show better knowledge about this subject than the 
Valdiator does?

>>«fatal error» and «non-existent character encoding»
>>    
>>
>
>Yes, I get the same for pc-multilingual-850+euro and it does
>exist, <http://www.iana.org/assignments/charset-reg/IBM00858>
>  
>


The terminology that the Validator uses here mimics the wording that 
(e.g.) the XML spesifications use in section 4.3.3:

    «In the absence of information provided by an external transport
    protocol (e.g. HTTP or MIME), it is a fatal error»

I really think they should have found a better and more informative wording.

>> It would have been more approriate to warn against using
>> the x-mac encodings pointing to their status as «private» 
>> encodings.
>  
>
>
>In your case the validator apparently has an idea, and you
>think (not sure) that it's a bad idea.  You could propose a
>better text for the warning.
>

I allready made some such suggestions in a reply to the letter about new 
Help & FAQ pages. Don't know if they listen though ...

>  For my case I get this:
>
>The detected character encoding was "pc-multilingual-850+euro".
>The error was "".
>  
>

That is the exact same that one get when one e.g. use x-mac-cyrillic or 
x-mac-hebrew or ... If I understand you correctly, we agree that they 
should offer a more enlightenling text. Why not offer the option to 
validate with Character Encoding Override with the click of a button 
instead of this unhelpful text?

>>This is a bit difficult to do «on the fly», for instance for
>>an online document or one document you view locally and which
>>works perfectly in your browser.
>>    
>>
>
>IMHO not really difficult, just use the advanced interface:
><http://validator.w3.org/file-upload.html>
>
>Nice. now my pc-multilingual-850+euro is "tentatively valid":
>  
>

It is customary to put [ Valid XHTML ] buttons on one's web pages. Are 
there a way to get thos [Valid] buttons to automatically use the 
extended interface?

>>I or anyone else, cannot register them there unless we
>>change their names ...
>>    
>>
>
>Tough.  But in theory it's possible, IBM, MicroSoft, etc.
>registered tons of their charsets, Apple could also do it.
>  
>

It is a task for Apple. I think that Apple wanted to be a nice guy and 
not bother the world with «Apple-code pages». Or may be that they did 
not see the poinst. You can read about x-mac-roman, x-mac-cyrillic etc 
at Unicode.org, btw. You find these spesifications side by side with the 
Windows- code pages. These «code pages» does not include the Euro symbol.

At this stage it seems to late to do so. And why should it be needed? 
What would the great benefit be?


>>If any 'x-mac-' encoding could be treated as 'macintosh'
>>    
>>
>
>If there's more than one somebody should really register them.
>Maybe they could remove the "x-" from these names.  If you're
>not sure (and that's perfectly okay) ask Apple to do this for
>you.  
>
>OTOH maybe they have very good reasons why they don't register
>their charsets, a registered charset _never_ changes.  No more
>"let's add the Euro" and similar stunts.
>  
>

When Apple introduced OS X they stopped to use some of the encodings 
that was available in Mac OS 9. In OS 9 you had to install socalled 
Language Kits to be able to type in «funny languages». When they ported 
OS 9 to OS X in the form of the socalled Carbon layer of OS X (not the 
same as Classic, which is an Apple supported OS 9 emulator) they made a 
choice about which Language Kits to continue to support. Cyrilllic was 
one of them.  Hebrew was not. Not Icelandic either, btw ;-) The 
consequense inside OS X is, I assume, that even if you port e.g. a 
Hebrew program to Carbon, you may not type Hebrew in it unless the 
developer ad lots of extra code.  Well, enough details. But may be the 
shift to OS X was one reason. Though a stupid reason, I must say.

I think that  there is no reason to change the x-mac encodings since OS 
X was introduced, since OS X uses UTF-8 for all practical purposes, 
except in the Carbon layer.

So, honestly the problem here is the Validator.

>>Thank you for your irony.
>>    
>>
>
>No irony intended.  
>

All right. Sorry then.

>Just the normal options (in addition to
>whining):  If you don't like it change it or leave it.  If in
>your world something like TLD .local exists, you're entitled
>to use it.  But don't ask whois.iana.org about it, don't use
>it in mail on the Internet, and so on.
>  
>

The .local thing has nothing to do with this.

>>blindly using the IANA registry as guide to the WWW
>>    
>>
>
>Sorry, but that's definitely not the case, I use this registry
>intentionally.  I don't expect that the validator handles URL
><about:mozilla> and I don't use "blink" in public documents.
>  
>

Go on about:mozilla if you like ...   I am not such a good and forgiving 
person as you are. I prefer to rely on the XML and HTML specs rather 
than what the Validator claims about «official» character sets. There is 
no such thing. There is only registred and not registred character sets.

And by the way, the W3Validiator points to <htmlhelp.com> which  
conveniently  has a page about other validators at  
<http://www.htmlhelp.com/links/validators.htm>.  There I found one 
Validator from your home country at <http://www.validome.org/>. It 
supports 98 character encodings. Unfortunatly I could not find IBM00858, 
but if I were you I would be more optimistic about getting them to 
support that charset than what I would be about getting the W3Validator 
team to do so.
-- 
leif halvard silli, oslo
Received on Sunday, 24 April 2005 01:36:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:18 GMT