Re: [VE][247] Add Subject Here from Leif Halvard Silli on 2011-11-17 (www-validator@w3.org from November 2011)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 17 Nov 2011 02:29:32 +0100
To: Ville Skyttä <ville.skytta@iki.fi>
Cc: www-validator@w3.org
Message-ID: <20111117022932917518.ef0c31ab@xn--mlform-iua.no>
Ville Skyttä, Thu, 17 Nov 2011 00:31:43 +0200:
> On 11/16/2011 11:23 PM, Leif Halvard Silli wrote:

>> Whereas the correct thing is that it is the content model of the body
>> element that differs depending on STRICT and TRANSITIONAL doctype.
> 
> True, and additionally it speaks too rigidly of HTML 4.01 also when the
> document validated was something else, for example HTML 3.2 or ISO-HTML.
>  I've changed the explanation in validator's development version to this:

If the text is up for evaluation, then I'd like to rework it even more. 
But first, let me explain what the problems with your new text is - the 
problems are very similar to the problems of the old text:

>     The sequence <FOO /> can be interpreted in at least two different
>     ways, depending on the DOCTYPE of the document.

If it is 'body' that differs in its content model - depending on 
DOCTYPE, then why continue to repeat that <foo /> can be interpreted in 
*two* ways? What are those two ways? The SGML way? The HTML5-way? The 
XML-way? Since it is a HTML4 document, then - per the HTML4 syntax and 
the SGML rules, it can only be interpreted a *single* way, I suppose, 
not?

>     For example for
>     HTML 4.01 and earlier, the '/' terminates the tag <FOO (with an
>     implied '>').

Yes. But what is the other interpretation - where does it comes from? 
Are you referring to how browsers interprets it? Or XML? If so, then 
please explain how *who* interprets it. 

>     However, since many browsers don't interpret it this
>     way, even in the presence of a "strict" DOCTYPE,

Did we not agree that 'strict' is unrelated to how <FOO/> itself is 
interpreted? It is e.g. the body that differs. Also: It is not a 
*problem* that browsers don't behave this way, or what? It would have 
been a problem that they don't behave that way only if we *wanted* them 
to behave that way.

The real *problem* is that the SGML rules describes and allows things 
that are not 'real', so to speak. W.r.t. to HTML4's 'esoteric' 
features, then the most important thing seems to me to warn authors 
that it actually means something other than they think.

>     it is best to
>     avoid it completely in pure HTML documents and reserve its use
>     solely for those written in XHTML.

Ah, here you touch the subject of XHTML. However, for XHTML - served as 
HTML, we have the same problem: <div/> is permitted in XHTML, but means 
something else 'on the Web'/'in HTML5'/'in Web browsers' etc. If you 
had changed 'written' to 'served as', then it would be more true, of 
course.

      Here is an attempt at another text variant:

''
  A sequence such as <HR /> may or may not be valid in a HTML4
  document, but not for the reasons that most would probably expect.

  The SGML model: In HTML4 and below, a sequence on the form <FOO /> 
  formally always represents two items: A start tag plus the character
  '>'. This because per HTML4's SGML rules, a start tag ends if a
  '/' is found inside the tag. The stray '>' character that this per
  the SGML rules causes, may in turn make the document invalid, e.g. if
  it occurs in a context where characters are not allowed. And as HTML4
  Strict is more restrictive than HTML4 Transitional on where characters
  may occur, the validator may report more errors with the STRICT 
  doctype compard to with the TRANSITIONAL doctype. The fundamental
  issue is however the same for both doctypes.

  A better approximation of what standard Web browsers do with
  <FOO />, is found in XHTML 1 Appendix C as well in the syntax
  rules for HTML5: Both the SGML effect and the XML effect is ignored
  when consumed as HTML. It is only when consumed as XHTML that standard
  Web browsers adhere to the XML-interpretation, in which <FOO/> equals
  <FOO></FOO>.
''

And while I am at it: It would be just in place to add a warning in the 
XHTML1 validator too: Currently, if you do <div/>, then it will 
validate. However, as we know that most Web authors serves XHTML 1.0 as 
text/html, it does not really make sense that that is valid *without a 
warning*, or what?

Just compare with that what we discuss here: <img/> does not create any 
'real' problems since everyone know what they mean when they do that - 
it just that it formally means something else than most people think. 
(In practical terms, I think that most authors do not think it means 
anything at all.) 

By contrast, in XHTML, then authors may be more likely to know what 
<div/> formally means. But as we have seen in several W3C 
specifications, in practice, they don't seem to know - or to forget - 
that <h1><a id='headingID' />Lorem</h1> is invalid, with funny effects 
as a result.
-- 
Leif Halvard Silli
Received on Thursday, 17 November 2011 01:30:11 UTC