Re: Codepage from Frank Ellermann on 2004-10-03 (www-validator@w3.org from October 2004)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Mon, 04 Oct 2004 01:54:30 +0200
To: www-validator@w3.org
Message-ID: <41609136.22E@xyzzy.claranet.de>
Jukka K. Korpela wrote:

> there are many real problems that need to be fixed in the
> validator.

It can handle windows-1252, therefore it could also handle 437
or 858.

> authors should not be encouraged to use such rarely used
> and never needed proprietary encodings.

Supporting IANA registered charsets is not "encouraging" to
use this stuff where it isn't needed.  DOS and OS/2 systems
with these charsets simply exist, plus applications using
these charsets, plus text documents using these charsets,
and authors might wish to add some "text screen shots" in a
HTML document.

I tried it some times with your method (convert to Latin-1
and use references for all remaining characters), but this
doesn't really work with box drawing charachters.  Or other
forms of "ASCII art" with PC charsets.

>> Today it's either windows-1252 or Unicode for scripts
>> roughly covered by Latin-1.
> I wonder why you don't mention the most obvious alternative.

Not sure what you're talking about, but windows-1252 has also
all printable characters of Latin-9 (on different positions),
plus 21 other characters (32-5-6=21), please correct me if I
got this wrong.

 [box drawing characters]
> I think very few people actually use them, and hardly anyone
> _needs_ them.

They have applications using these characters in their output.
If you wanted to say that nobody creates _new_ texts with these
characters you have a point (as far as I'm concerned, but there
were questions about 437 and 858 more than once here, so some
users apparently still "need"/want this for whatever reasons).

BTW, a simple trick could be a charset override windows-1252,
and then ignore the warning in the validation result.

> Depending on what you imagine as the potential use of box
> drawing characters, they would better be replaced by the use
> of CSS (especially border properties)

Sure, for _new_ texts.  But if you want to insert some curses
output of a chess game in your blog "as is" that's no option.

> or images with suitable alt texts

That's a possible workaround, but images of text screens often
don't work as expected.  In that case you could also create a
PDF document (or an INF document in the case of OS/2, that's a
proprietary format in the spirit of troff).

> Is this what you meant to present? Why?

The source is pc-multilingual-850+euro, and what you saw was
the result of applying xhtml.kex on itself.  Only relevant for
systems where 858 is the native charset, forget it.  Actually
I should fix it to use windows-1252 and 0x80 instead of &euro;
OTOH I rarely need this character, and if I do it's easy to
patch the output.

> On IE and Firefox, I see the string
> %ent-isopub;  %ent-isonum;  %ent-isolat2;  %ent-isobox;
> %ent-isotech; %ent-isoamsa;  %ent-isoamso; ]>
> at the very beginning.

Probably telling you that they don't know what that's about.

> undefined entities like &blank; are shown literally

It's defined by %ent-isopub;, if your browsers didn't get the
former they don't know about the latter, working as designed.

I wanted to use some symbolic names defined for MathML as far
as they could be used instead of box drawing characters and PC
graphics.  Of course no browser supports this, or at least not
yet.  Maybe never, the new policy is apparently that all these
symbolic names are a bad idea, and hex. references are the way
to go.

> I would suggest upgrading from XHTML 1.0 to HTML 4.01.

I like XHTML transitional.  For HTML 4 I'd have to learn SGML,
and then see that it's not really supported by "any browser",
too confusing for me.  For you HTML is fine, because you know
all practically relevant SGML oddities.

> The XHTML 1.0 specification requires the use of one of
> specific DOCTYPE declarations, literally.

You can't add your own definitions ?  I didn't know this, the
validator never told me. ;-)  But I'm used to update this page
whenever the validator changes (3 days after 9-11 was a major
change, within minutes my page turned from "valid" to "broken")

> using XML with XHTML tags doesn't really work on the Web du
> jour.

Yes, I know.  It works with xml2rfc <http://xml.resource.org/>,
but that's XML, not one of the XHTML 1.0 variants.  The real
problem is that I can't use any XHTML+MathML, because that's
always strict, and writing my own DTD makes no sense (read =
I wouldn't know how to add the essential transitional sugar ;-)

                       Bye, Frank
Received on Monday, 4 October 2004 00:06:13 UTC