W3C home > Mailing lists > Public > www-validator@w3.org > April 2005

Re: Charset policy or?

From: leif halvard silli <hyperlekken@lenk.no>
Date: Mon, 25 Apr 2005 19:25:01 +0200
Message-ID: <426D27ED.8020805@lenk.no>
To: www-validator@w3.org
CC: Frank Ellermann <nobody@xyzzy.claranet.de>

Frank Ellermann wrote:

>leif halvard silli wrote:
> 
>
>><http://www.w3.org/TR/charmod/#C023>
>>
>
>If you found #C023 you MUST have seen #C022:
>
>| Character encodings that are not in the IANA registry SHOULD
>| NOT be used, except by private agreement.
>
Probably I was to tired -- after all you left it over to me to find out 
how that document confirmed your statemnts. But anyway, you of course 
know what SHOULD NOT means, better than I do:

4. SHOULD NOT   This phrase, or the phrase "NOT RECOMMENDED" mean that
   there may exist valid reasons in particular circumstances when the
   particular behavior is acceptable or even useful, but the full
   implications should be understood and the case carefully weighed
   before implementing any behavior described with this label.


To say that 'private agreement' defines the «particular circumstances» 
when the x- encodings should be "acceptable or even useful", is at least 
meaningless.


>Now you need a private agreement with the validator.w3 team ;-)
>

W3 are free to make that contract on-the-fly whenever I try to validate 
such a document ...


>Really, what's the pupose of this discussion ?
>

To let W3 offer us a tool that let us, the users, decide when using a 
not IANA registered character set is "acceptable or even useful". For me 
it started as a practical problem.

The secound purpose I thought you agreed with me about: the Validator 
isn't helfpful in suggesting e.g. that one may use the 'macintosh' 
encoding instead of x-mac encodigns.

>  The validator
>is a tool, it tries to find errors.  In some cases like <br />
>in HTML it's lost, in theory that's legal, in practice it does
>not work as expected.
>

Absolutely, it is a tool. I knew that character sets are very dry 
matters . So I was just irritated that I had to perform so many steps in 
order to validate it. (And I admit that I did not understand that I 
could use Charset Overriding with 'macintosh' instead of the x-mac 
encodings.)

>You have a choice, you can use the WDG validator where you get
>a warning for this issue.
>
>
>>It is not written anywhere that Validators, browsers or
>>whatever should _not_ accept x- encodings.
>>
>
>But it is written that validating XML processors are expected
>to support a _limited_ set of encodings, the minimum is UTF-8
>plus some others.  I'm lost with SGML (the validator.w3 is
>AFAIK based on SGML), but probably the SGML rules are similar.
>

I think that most users, in particular mac users, do not expect that x- 
mac encodings are understood by all programs. We expect very little 
compatibility. But I thought that W3Validator should help with the pure 
validating and leave the encoding issue over to me -- eventually with a 
warning  -- they are usually very happy to issue warnings.

>>Ah, I was waiting for you to say that. May be I am the
>>authority on this issue?
>>
>
>Maybe.  Send them a patch for their code. [ ...]
>

Unfortunatly, this is not anything I would be able to do.

>  And tell me exactly
>how you've done it, I can't test perl on my box, but I'd like
>the validator to support "437" and "858", just because I think
>that it's bad to exclude minority platforms by ignorance.
>
>OTOH publishing XHTML pages in really obscure charsets without
>convincing reasons would be also bad.  "Best viewed with Lynx
>on OS/2 and codepage 850" isn't a good idea.  I really have
>texts in these charsets, but they are plain text.  Fortunately
>the validator is too stupid to complain about a link like:
>
><a href="xhtml.kex" type="text/plain" charset="PC-Multilingual-850+euro">
>

The long in the making mac web browser iCab -- also a german product btw 
-- automatically smiles if a document is valid. I read that it doesn't 
smile for all W3 pages ;-) 
<http://mjtsai.com/blog/2004/12/24/icab-30-beta-222-screenshots/ >

>>If I understand you correctly, we agree that they should 
>>offer a more enlightenling text. Why not offer the option to
>>validate with Character Encoding Override with the click of a
>>button instead of this unhelpful text?
>>
>
>It doesn't help you that we agree.  IIRC the missing support
>for some (registered) charsets is also listed in the bugzilla,
>maybe I've even voted for it.
>

You know the W3Valdidator has this very optimistic message saying that 
we should contact them if  we stumble upon a charset that they do not 
support ...

I want to mention Apple again. Clearly they have simply followed a 
different policy than IBM and Microsoft. With all the charsets that IBM 
has registered, they might just as well have not registered them ... I 
mean, it is very bad that I cannot claim that the x-mac encodings are at 
least _formally_ correct. But other than that it seems you with your 
seldom but registred IBM charset and I with mine, are in the  same boat.


>>It is customary to put [ Valid XHTML ] buttons on one's web
>>pages.
>>
>
>Not really.  If you have 50 (?) or less pages you can validate
>them with WDG, and then you don't need these funny buttons on
>every page.  
>

I don't use them yet, myself. But these buttons are in fact used as 
«medals» or «diplomas». If I have a valid document, but in a wrong 
charset ... I don't get the diploma even if I deserve it ;-)

>>Are there a way to get thos [Valid] buttons to automatically
>>use the extended interface?
>>
>
>Not with the referer trick, but you can specify all parameters
>you like with an absolute URL, here's a macintosh example:
>
>http://validator.w3.org/check?uri=http%3A%2F%2Fpurl.net%2Fxyzzy&charset=macintosh
>
>The best you can get in this case is a "tentatively valid".
>

Ok, but that is a useful tip.

>>You can read about x-mac-roman, x-mac-cyrillic etc at 
>>Unicode.org, btw.
>>
>
>I'm more interested in what my browser supports, and it doesn't
>know x-mac-whatever or pc-multi-thingy.  It also doesn't know
>UTF-8, and no windows-1252.  But it's so stupid that it does
>the right thing for 1252 without knowing why.  I love stupid
>software.
>

:-)

>Can't you create and read windows-1252 ?  It's almost the same
>as Latin-1, minus 128..159, where it offers 128 Euro etc.
>

I can. I can create most things. Using e.g. Mozilla software  - francly 
there is no better encoding converter than the Mozilla's! But yesterday 
when I wrote about this I found some (probably new) bugs in the x-mac 
support ... Like I said, for me it was mostly a practical problem. I 
could not/cannot understand that supporting more charsets should be so 
difficult. But I guess they want to use the Validator to enforce their 
SHOULD NOT policy. And that irritates me, in fact.

>  For
>backwards compatible pages (in other words pages with an Euro)
>I use windows-1252, otherwise I use us-ascii plus some symbolic
>character references for Latin-1.
>
>Which I shouldn't if I'd follow W3C charmod, because Latin-1 is
>"better" than us-ascii + character references.  But my pages
>and XHTML tool are older than W3C charmod, my local charset is
>normally pc-multilingual-850+euro, not windows-1252, and for
>English pages I really need only us-ascii with few exceptions.
>
>Visitors of my German pages will survive it when I "force" them
>to download some &Ouml; instead of simple Latin-1 Ös.
>

You probably have a very good reason for sticking to OS2 ;-)

>>Not Icelandic either, btw ;-)
>>
>
>The DOS and OS/2 Icelandic codepage 861 used to be my favourite
>test case, because I was sure that it's strange and irrelevant
>from my POV.  Maybe the Unicode conspira^H^H^H^Hortium should
>decorate the Euro as their most fearsome ally.  It certainly
>killed a lot of legacy charsets.
>

There you have a good point.

>>OS X uses UTF-8 for all practical purposes, except in the 
>>Carbon layer.
>>
>
>Whatever that is, do you really need it for (X)HTML ?  If the
>only reason to use x-mac-whatever is for fun, then I guess that
>the developers of the validator.w3 have their own priorities of
>what they consider as fun.
>

No, it wasn't for fun. I use a program that has its roots in Mac OS 9, 
and which therefore isn't unicode savvy yet (there are many such 
programs for OSX, even after all these years with OS X). It is 
unreliable with any other charset than x-mac-roman/macintosh, therefore 
I use that encoding and convert the output with Tidy to UTF-8 before 
putting it online. But when something failed, I wanted to find out where 
the error was.

If the W3Validator would at least recognize x-mac's and suggest using 
Character Override, I would be happy.
-- 
leif halvard silli, oslo
Received on Monday, 25 April 2005 17:25:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:18 GMT