- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 4 Oct 2004 09:50:33 +0300 (EEST)
- To: Frank Ellermann <nobody@xyzzy.claranet.de>
- Cc: www-validator@w3.org
On Mon, 4 Oct 2004, Frank Ellermann wrote: > It can handle windows-1252, therefore it could also handle 437 > or 858. Handling a very common proprietary encoding doesn't mean you need to handle all. > Supporting IANA registered charsets is not "encouraging" to > use this stuff where it isn't needed. It is. And "IANA registered" is fairly irrelevant. Windows-1252 was widely used on the Web before it was registered at IANA, and so was text/css, and text/javascript isn't registered even now. The IANA registrations are a formality; a useful one if you ask me, but not taken so seriously by most players in the field. And if someone or his brother registered a few hundred encodings just because it's possible, should the validator start supporting those encodings too? > DOS and OS/2 systems > with these charsets simply exist, plus applications using > these charsets, plus text documents using these charsets, For use on the Web or even in intranets, those files just need to be converted. > and authors might wish to add some "text screen shots" in a > HTML document. Then we should not encourage them to use e.g. box drawing characters in them. Images work better in such cases. For an image, you can at least specify an alt attribute (and a validator will report if you forget to include any alt attribute); a "text screen shot", especially when containing characters like box drawing, is just mumbo-jumbo to a screen reader, for example. > >> Today it's either windows-1252 or Unicode for scripts > >> roughly covered by Latin-1. > > I wonder why you don't mention the most obvious alternative. > > Not sure what you're talking about, ISO-8859-1. > [box drawing characters] > > I think very few people actually use them, and hardly anyone > > _needs_ them. > > They have applications using these characters in their output. Such antique programs might be interesting for nostalgic reasons, but we are discussing documents using markup like HTML or XML. > If you wanted to say that nobody creates _new_ texts with these > characters you have a point (as far as I'm concerned, but there > were questions about 437 and 858 more than once here, so some > users apparently still "need"/want this for whatever reasons). Whatever the reasons are, the right answer is to convert to an internationally standardized encoding, such as iso-8859-1 or utf-8. > > Depending on what you imagine as the potential use of box > > drawing characters, they would better be replaced by the use > > of CSS (especially border properties) > > Sure, for _new_ texts. But if you want to insert some curses > output of a chess game in your blog "as is" that's no option. A blog is generated and maintained by software. Get or write software that can handle the data you want to play with. > > or images with suitable alt texts > > That's a possible workaround, No, solution. The "text screen shot" you propose is a proposed workaround, which does not work around limitations but creates them. > > Is this what you meant to present? Why? > > The source is pc-multilingual-850+euro, and what you saw was > the result of applying xhtml.kex on itself. Only relevant for > systems where 858 is the native charset, forget it. OK. I just suspected there might have been some point. > I wanted to use some symbolic names defined for MathML as far > as they could be used instead of box drawing characters and PC > graphics. Of course no browser supports this, or at least not > yet. Yes, I know that. I suspected you didn't, since you referred to such usage on the Web as an argument in our discussion. > For HTML 4 I'd have to learn SGML, Not really. Even the creators of HTML 4 didn't know SGML very well, and very few people using HTML 4 knows SGML. > For you HTML is fine, because you know > all practically relevant SGML oddities. It's not that hard to learn them, and the real oddities are in the (lack of) browser support to SGML - that is, actual browser behavior, and this is something we need to know anyway, as authors. But do you know the practically relevant XML oddities? For example, the principle that parsers (and hence browsers) need not read external subsets and need not even tell they don't? That is, they may happily ignore your attempts to include entity references from an external file. > > The XHTML 1.0 specification requires the use of one of > > specific DOCTYPE declarations, literally. > > You can't add your own definitions ? The spec says: "A Strictly Conforming XHTML Document is an XML document that requires only the facilities described as mandatory in this specification." The terminology is quite confused and confusing, as so often in W3C documents when normative conformance is described. There is no other conformance defined in the specification but strict conformance. And this has absolutely nothing to do with the issue of XHTML Strict vs. XHTML Transitional. But my description was oversimplified. Unlike HTML 4.01, XHTML 1.0 specification does not say that you must use of the specific DOCTYPE declarations listed. Instead it says: "The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found in DTDs using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions." So technically you can add something there. But since it is not mandatory for a parser to process an external subset, the document would not be a "Strictly Conforming XHTML document", i.e. not a conforming XHTML document, i.e. not an XHTML document (though it may well be a valid XML document and might actually be be reported as "Valid XHTML 1.0!" by the validator, which is yet another indication of the inadequacy of such wordings in the reports). > But I'm used to update this page > whenever the validator changes You are joking, right? Or don't you know that a validator only performs some trivial syntax checking, without checking _even_ the syntax except in some respects? (And as you use XHTML, the scope of these checks is more limited than when using HTML, simply because the metalanguage used for XHTML is much much more limited - this is the reason for replacing SGML by XML, remember?) -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 4 October 2004 06:51:06 UTC