Re: [VE][65] elements inducing context-sensitive constraints

2013-09-10 2:15, Roland wrote:

> Error [65]: "document type does not allow element X here; missing one
> of Y start-tag"
>
> This error message appears in situations like <body> <a name="top">
> Using <a> in this manner is out of date, but it is the simplest
> example of the issue.

The situation occurs whenever the first child of body is an inline 
element. The case <a name="top"> is not simpler than other inline 
element start tags, like <b> or <span>.

The error message appears when validating against HTML 4.01 Strict or 
XHTML 1.0 Strict (or HTML 2.0 Strict, for that matter!) or other DTD 
that enforces the requirement that inline content is not allowed inside 
body without intervening markup. It does not appear when validating e.g. 
against HTML 4.01 Transitional or HTML5.

> No place, on the W3C Web site nor outside literature, do I recall
> seeing a statement that the <body> element forces a context-sensitive
> requirement that all children be block-level elements.

The rule that the validator is applying is

<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL)>

which is mentioned right at the start of 7.5.1 The BODY element of the 
HTML 4.01 specification:
http://www.w3.org/TR/REC-html40/struct/global.html#edef-BODY

Such formal syntax rules are what the validator is applying, when 
performing markup validation in the SGML or XML sense. This is one 
reason why validation is so overrated. You really need to understand 
what validation is in order to benefit from it rather than get confused, 
but it is often recommended as if it were simple and easy.

It seems that the HTML 4.01 specification does not describe this rule in 
prose. It is not even mentioned in the part that discusses the 
differences between HTML 4.01 Strict and HTML 4.01 Transitional. I was 
astonished at this, but apparently I learned it from somewhere else, 
like a textbook, years ago. But this is a problem with the 
specification, not the validator.

> This is
> especially insidious when the error occurs somewhere deep within the
> code (I tested it as the last child) and it is easy to be oblivious
> to the fact that you're now back up to the child level.

The syntax rule is useful if and only if you wish to stick to coding 
style where all content in body is wrapped in block containers. In HTML 
4.0, this rule is bundled together with a rule that forbids most of 
so-called presentational markup. There is really no logical connection 
between the two rules, except that you might call both of them 
"Puristic" (or "Strict"). But if you wish to have only one of them 
applied, you need a custom DTD.

> The explanatory note includes the statement "This might mean that you
> need a containing element, ...." This is, of course, true, but it
> fails to note the special character of the <body> element.

In terms of SGML validation, which is what this is about, no element has 
any special character. Different elements have different content models.

> The blockquote element has the same undocumented
> constraint.

It is not undocumented. It's just documented formally only.

> The paragraph element has the opposite constraint--it may not include
> a block-level element.

It's a completely different constraint.

> This is a nuisance when I want to include a
> <pre> within a paragraph.

The HTML concept of paragraph corresponds to a paragraph of text, which 
may contain images and other embedded inline objects, but otherwise it's 
just flow of text, possibly with text-level (phrase-level) markup.

The formal rule, unlike the rule forbidding direct inline content in 
body, reflects browser reality: a browser implicitly closes an open p 
element when it encounters <pre>. So it's more than just a rule "thou 
shall not use pre within p"; you *cannot* use pre within p, any attempt 
at doing so will fail, instead of just being formally wrong.

> These context sensitive constraints (and any others) deserve special
> mention somewhere--within the error message would be most useful;
> after all, the parser knows the current token--without it the parser
> couldn't generate the list of possible predecessors.

I'm afraid the validator uses an old SGML parser that has no provisions 
for indicating "the current token", due to the way the parser has been 
coded. And I'm afraid nobody will work on the SGML validator; all work 
is directed towards HTML5 validation, which is a completely different 
animal (and does not have this issue, because HTML5 rules allow direct 
inline content in body).

Yucca

Received on Tuesday, 10 September 2013 06:51:18 UTC