- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 17 Nov 2011 02:29:32 +0100
- To: Ville Skyttä <ville.skytta@iki.fi>
- Cc: www-validator@w3.org
Ville Skyttä, Thu, 17 Nov 2011 00:31:43 +0200: > On 11/16/2011 11:23 PM, Leif Halvard Silli wrote: >> Whereas the correct thing is that it is the content model of the body >> element that differs depending on STRICT and TRANSITIONAL doctype. > > True, and additionally it speaks too rigidly of HTML 4.01 also when the > document validated was something else, for example HTML 3.2 or ISO-HTML. > I've changed the explanation in validator's development version to this: If the text is up for evaluation, then I'd like to rework it even more. But first, let me explain what the problems with your new text is - the problems are very similar to the problems of the old text: > The sequence <FOO /> can be interpreted in at least two different > ways, depending on the DOCTYPE of the document. If it is 'body' that differs in its content model - depending on DOCTYPE, then why continue to repeat that <foo /> can be interpreted in *two* ways? What are those two ways? The SGML way? The HTML5-way? The XML-way? Since it is a HTML4 document, then - per the HTML4 syntax and the SGML rules, it can only be interpreted a *single* way, I suppose, not? > For example for > HTML 4.01 and earlier, the '/' terminates the tag <FOO (with an > implied '>'). Yes. But what is the other interpretation - where does it comes from? Are you referring to how browsers interprets it? Or XML? If so, then please explain how *who* interprets it. > However, since many browsers don't interpret it this > way, even in the presence of a "strict" DOCTYPE, Did we not agree that 'strict' is unrelated to how <FOO/> itself is interpreted? It is e.g. the body that differs. Also: It is not a *problem* that browsers don't behave this way, or what? It would have been a problem that they don't behave that way only if we *wanted* them to behave that way. The real *problem* is that the SGML rules describes and allows things that are not 'real', so to speak. W.r.t. to HTML4's 'esoteric' features, then the most important thing seems to me to warn authors that it actually means something other than they think. > it is best to > avoid it completely in pure HTML documents and reserve its use > solely for those written in XHTML. Ah, here you touch the subject of XHTML. However, for XHTML - served as HTML, we have the same problem: <div/> is permitted in XHTML, but means something else 'on the Web'/'in HTML5'/'in Web browsers' etc. If you had changed 'written' to 'served as', then it would be more true, of course. Here is an attempt at another text variant: '' A sequence such as <HR /> may or may not be valid in a HTML4 document, but not for the reasons that most would probably expect. The SGML model: In HTML4 and below, a sequence on the form <FOO /> formally always represents two items: A start tag plus the character '>'. This because per HTML4's SGML rules, a start tag ends if a '/' is found inside the tag. The stray '>' character that this per the SGML rules causes, may in turn make the document invalid, e.g. if it occurs in a context where characters are not allowed. And as HTML4 Strict is more restrictive than HTML4 Transitional on where characters may occur, the validator may report more errors with the STRICT doctype compard to with the TRANSITIONAL doctype. The fundamental issue is however the same for both doctypes. A better approximation of what standard Web browsers do with <FOO />, is found in XHTML 1 Appendix C as well in the syntax rules for HTML5: Both the SGML effect and the XML effect is ignored when consumed as HTML. It is only when consumed as XHTML that standard Web browsers adhere to the XML-interpretation, in which <FOO/> equals <FOO></FOO>. '' And while I am at it: It would be just in place to add a warning in the XHTML1 validator too: Currently, if you do <div/>, then it will validate. However, as we know that most Web authors serves XHTML 1.0 as text/html, it does not really make sense that that is valid *without a warning*, or what? Just compare with that what we discuss here: <img/> does not create any 'real' problems since everyone know what they mean when they do that - it just that it formally means something else than most people think. (In practical terms, I think that most authors do not think it means anything at all.) By contrast, in XHTML, then authors may be more likely to know what <div/> formally means. But as we have seen in several W3C specifications, in practice, they don't seem to know - or to forget - that <h1><a id='headingID' />Lorem</h1> is invalid, with funny effects as a result. -- Leif Halvard Silli
Received on Thursday, 17 November 2011 01:30:11 UTC