W3C home > Mailing lists > Public > www-validator@w3.org > April 2005

Re: Charset policy or?

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Mon, 25 Apr 2005 13:33:57 +0200
To: www-validator@w3.org
Message-ID: <426CD5A5.723@xyzzy.claranet.de>

leif halvard silli wrote:
 
> <http://www.w3.org/TR/charmod/#C023>

If you found #C023 you MUST have seen #C022:

| Character encodings that are not in the IANA registry SHOULD
| NOT be used, except by private agreement.

Now you need a private agreement with the validator.w3 team ;-)

Really, what's the pupose of this discussion ?  The validator
is a tool, it tries to find errors.  In some cases like <br />
in HTML it's lost, in theory that's legal, in practice it does
not work as expected.

You have a choice, you can use the WDG validator where you get
a warning for this issue.

> It is not written anywhere that Validators, browsers or
> whatever should _not_ accept x- encodings.

But it is written that validating XML processors are expected
to support a _limited_ set of encodings, the minimum is UTF-8
plus some others.  I'm lost with SGML (the validator.w3 is
AFAIK based on SGML), but probably the SGML rules are similar.

>> If even you are not sure, the validator is lost.

> Ah, I was waiting for you to say that. May be I am the
> authority on this issue?

Maybe.  Send them a patch for their code.  And tell me exactly
how you've done it, I can't test perl on my box, but I'd like
the validator to support "437" and "858", just because I think
that it's bad to exclude minority platforms by ignorance.

OTOH publishing XHTML pages in really obscure charsets without
convincing reasons would be also bad.  "Best viewed with Lynx
on OS/2 and codepage 850" isn't a good idea.  I really have
texts in these charsets, but they are plain text.  Fortunately
the validator is too stupid to complain about a link like:

<a href="xhtml.kex" type="text/plain" charset="PC-Multilingual-850+euro">

> If I understand you correctly, we agree that they should 
> offer a more enlightenling text. Why not offer the option to
> validate with Character Encoding Override with the click of a
> button instead of this unhelpful text?

It doesn't help you that we agree.  IIRC the missing support
for some (registered) charsets is also listed in the bugzilla,
maybe I've even voted for it.

> It is customary to put [ Valid XHTML ] buttons on one's web
> pages.

Not really.  If you have 50 (?) or less pages you can validate
them with WDG, and then you don't need these funny buttons on
every page.  

> Are there a way to get thos [Valid] buttons to automatically
> use the extended interface?

Not with the referer trick, but you can specify all parameters
you like with an absolute URL, here's a macintosh example:

http://validator.w3.org/check?uri=http%3A%2F%2Fpurl.net%2Fxyzzy&charset=macintosh

The best you can get in this case is a "tentatively valid".

> You can read about x-mac-roman, x-mac-cyrillic etc at 
> Unicode.org, btw.

I'm more interested in what my browser supports, and it doesn't
know x-mac-whatever or pc-multi-thingy.  It also doesn't know
UTF-8, and no windows-1252.  But it's so stupid that it does
the right thing for 1252 without knowing why.  I love stupid
software.

Can't you create and read windows-1252 ?  It's almost the same
as Latin-1, minus 128..159, where it offers 128 Euro etc.  For
backwards compatible pages (in other words pages with an Euro)
I use windows-1252, otherwise I use us-ascii plus some symbolic
character references for Latin-1.

Which I shouldn't if I'd follow W3C charmod, because Latin-1 is
"better" than us-ascii + character references.  But my pages
and XHTML tool are older than W3C charmod, my local charset is
normally pc-multilingual-850+euro, not windows-1252, and for
English pages I really need only us-ascii with few exceptions.

Visitors of my German pages will survive it when I "force" them
to download some &Ouml; instead of simple Latin-1 Ís.

> Not Icelandic either, btw ;-)

The DOS and OS/2 Icelandic codepage 861 used to be my favourite
test case, because I was sure that it's strange and irrelevant
from my POV.  Maybe the Unicode conspira^H^H^H^Hortium should
decorate the Euro as their most fearsome ally.  It certainly
killed a lot of legacy charsets.

> OS X uses UTF-8 for all practical purposes, except in the 
> Carbon layer.

Whatever that is, do you really need it for (X)HTML ?  If the
only reason to use x-mac-whatever is for fun, then I guess that
the developers of the validator.w3 have their own priorities of
what they consider as fun.
                                Bye, Frank
Received on Monday, 25 April 2005 11:39:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:18 GMT