Re: Charset policy or?

Frank Ellermann answered me:

>> for instance the result of a puristic wish to only support
>> the IANA registred charsets
>
>
> If it's not IANA registered it doesn't exist.


What you said there is merely a tautological statement.

> The validator
> tries to catch invalid character encodings depending on the
> document charset, maybe also depending on XML 1.1 vs. 1.0, so
> it can't handle unknown encodings. E.g. byte 133, it depends
> on several factors how to handle it.

I am pretty sure that that the x-mac encodings has enough in common so 
that that it should be pretty easy to find one common method for 
handeling them all. Referring to the XML 1.1 spesification,

<http://www.w3.org/TR/2004/REC-xml11-20040204/#charencoding>

it says that a processor MAY treat an [_IANA registered_] encoding as 
unknown, and explains this by saying that processors are, «of course», 
not required to know  _all_ IANA encodings. Those encodings we talk 
about here, the x-mac encodings are referred to as «other encodings» 
about whom the recommandation says that they «SHOULD //use names 
starting with an "x-" prefix». It goes without saying that processors 
therefore _may_ deem x-mac encodings as _unknown_ (but not 'invalid'). 
But it _may_ also deem the 'macintosh' encoding as invalid, for that 
matter. And for any practical matters on the web, the 'macintosh' 
encoding is more incompatible than the 'x-mac' encodings. I am not even 
sure that 'x-mac-roman' and 'macintosh' (from Unicode 1.0 in in 1991) is 
100% equal.

If we compare this wo what the validator currently says about 
x-mac-roman we find that it uses the wordings «fatal error» and 
«non-existent character encoding» and finally «The error was 
"x-mac-roman undefined; replace by macintosh".». It would have been more 
approriate to warn against using the x-mac encodings pointing to their 
status as «private» encodings. But to completely refuse to validate is 
not very meaningfull. Since the x- encodings are mentioned in the XML 
1.1 spec it is directly misleading with those kinds of responce from the 
validator.

> OTOH you could always enforce "assume windows-1252" for all
> MIME-compatible 8-bits charsets where codepoints 128..159
> are valid. You could enforce Latin-1 where that's not true.
> And of course UTF-8 etc. are directly supported.


This is a bit difficult to do «on the fly», for instance for an online 
document or one document you view locally and which works perfectly in 
your browser.

>
>> I was very suprised to find out that x-mac-roman was not
>> accepted.
>
>
> Compare <http://www.iana.org/assignments/character-sets> :
> x-mac-roman does not exist, if you think that this is wrong
> register it (but maybe x-... is reserved for private use).


Exactly, x- are reserved for those that does not exist in IANA,  so I or 
anyone else, cannot register them there unless we change their names ... 
There is nothing illegal with the x-mac encodings per se. And they do exist.

>> the validator adviced me to use 'macintosh' as charset name.
>
>
> Yes, that exists, why not use it ?


Do  you need more reasons than those I have given? If any 'x-mac-' 
encoding could be treated as 'macintosh' purely for validation purposes, 
then I think that this is a task that should be given to the validator.  
For the Euro sign and some others, the validator would then perhaps have 
had to demand a character reference (the Euro sign doesn't occupy the 
same place in all the x-mac encodings, if I remember.) The x-mac 
encoings are all 8-bit encodings so I guess it could have been possible.

>> We Mac users live in this very perfect world where all
>> encodings are named x-mac-something.
>
>
> validator.w3.org is for the WWW. Maybe you could patch the
> sources for a parallel universe of Mac users and a similar
> validator.mac.org ?

Thank you for your irony. Indeed, if the Validator will not validate all 
valid documents, why not have another validator as well? Besides, you 
are wrong if you interpreted me as saying that x-mac encodings are not 
used on the WWW. They are not used in a large degree, but they are used. 
But at least they are much used than anything called 'macintosh', which 
you --blindly using the IANA registry as guide to the WWW-- recommend.
-- 
leif halvard

Received on Saturday, 23 April 2005 17:34:04 UTC