Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

Paul Grosso, Wed, 05 Feb 2014 11:19:49 -0600:
> [Some content of the original comment has been elided
> and/or rearranged below.]
> 
> On 2014-01-19 14:29, Leif Halvard Silli wrote:

  [ I deleted some text, for contraction ]

>> Question: But which constraints does a document type declaration
>> without an internal or external DTD express?

>> Therefore, my proposal is to extract rules or guidance for what
>> to do when the DOCTYPE declaration points to no markup declaration
>> and place this into the 6th edition of XML. (Or to put it differently:
>> define what to do when the DOCTYPE lacks an internal or external DTD.)

> At [1] we have:
> 
>  Definition: An XML document is valid if it has an associated
>  document type declaration and if the document complies with
>  the constraints expressed in it.
> 
> At [2] we have:
> 
>  validity constraint
> 
>  [Definition: A rule which applies to all valid XML documents.
>  Violations of validity constraints are errors; they MUST, at
>  user option, be reported by validating XML processors.]
>
> As indicated above, a document is not valid if it violates a
> validity constraint. Perhaps that could be made clearer in
> the definition of "valid" at [1]. But given that fact, and
> given the "Element Valid" validity constraint at [3], and the
> "Attribute Value Type" validity constraint at [4], a document
> containing any element or attribute for which there is no
> declaration in the associated DTD is not valid.

It sounds like you treat DTD and doctype declaration as one and the 
same thing. They are related. But a doctypedecl is not the DTD. The DTD 
is just a part of the doctypedecl production. 

What if I send a document without a doctypedecl construct to a 
*validating* processor? MUST the validating processor then, at user 
option, report that the validity constraints are broken? When I started 
this reply, I meant to say that it must report validity constraints 
even then. But my answer now is that validation has two parts: 1) Check 
whether the particular rules regarding element content etc defined in 
the DTD are fulfilled; 2) Check whether the validity constraints are 
fulfilled as well. Hence, if there is no DTD, there is nothing to 
report except ”not valid”. My claim remains, though, that *also* for 
documents *with* a construct that matches the doctypedecl production, 
the processor must locate a DTD before it can check for fulfillment of 
the validity constraints. 

> Put another way, one of the constraints a DTD puts on a
> document

The meaning of ”all valid XML documents” is crucial. In what way is 
that a reference to a class of documents? Does it mean ”all documents 
with a match for the doctypedecl production? The answer is no: 
”[Definition: An XML document is valid if it has an associated document 
type declaration and if the document complies with the constraints 
expressed in it.]” So whether a document is valid per its DTD is one 
thing. And the additional validity constraints of XML 1.0 is another 
thing. HOwever, the latter only applies if the document fulfills the 
former. So says XML 1.0.

So it is not the DTD that places XML 1.0’s validity constraints on the 
document. It is the *conformance* with the DTD that adds the 
requirement to *also* fulfill the validity constraints. 

That XML 1.0 says that a ”validity constraint” applies to ”all valid 
XML documents” may sound a little bit like a tautology. But I read this 
as follows: With ”all valid XML documents” XML 1.0 no doubts mean every 
document that has been *successfully* subjected to a validating XML 
processor (which describes rules about what contents particular 
elements and attributes can have etc). For *that* class of documents, 
there is one set of *additional* things the documents must be fulfill, 
namely the validity constraints.

So there are two parts of *valid*: There are those documents that are 
just valid. And there are those that are valid *and* fulfill the 
validity constraints.

> (for the document to be considered valid) is that
> the document must not contain any element or attribute that
> is not declared in the DTD. So a DTD that declares no
> elements or attributes constrains the document to have
> no elements or attributes to be considered valid (and
> such a document would not have a root element and would
> therefore not be valid).

This to me becomes a upside down. Even documents without a doctypedecl 
are ”constricted” to not have a DOCTYPE, a DTD or valid 
elements/attributes.  A doctypedecl that does not point to or contain a 
DTD places no restrictions on the document. Such a document fails to 
have ”an associated document type declaration” and it can thus not 
comply ”with the constraints expressed in it” and therefore is *not* 
subject to ”validity constraint” any more a document without a 
doctypedecl.

> As far as "documents with DOCTYPE but without markup
> declaration are not subject to validation", the XML spec has
> no concept of "subject to validation". That is a tool issue.
> Per section 5.1 Validating and Non-Validating Processors [5]:
> 
>  Conforming XML processors fall into two classes: validating
>  and non-validating.
> 
> No where does the spec say that anything in the document (e.g.,
> a doctype declaration) forces use of a validating processor.

Right. Nevertheless is the presence of a construct that matches the 
”doctypedecl” production often used as a validation trigger - something 
that ”turns on” the validation mode. More below.

> HTML5 can make its own rules about how a tool should process
> documents. Admittedly, if a tool is using an XML processor
> to process an HTML5 document, it should probably not use
> validation mode, but that is not something for the XML spec
> to address.
> 
> The XML Core WG will consider issuing an erratum that augments
> the definition of valid at [1] to read something like:
> 
>  Definition: An XML document is valid if it has an associated
>  document type declaration and if the document complies with
>  the constraints expressed in it and the document violates no
>  validity constraints.
> 
> We might also add a sentence to the first paragraph of the
> Conformance section at [5] so that that paragraph would
> then read something like:
> 
>  Conforming XML processors fall into two classes: validating
>  and non-validating.  The determination of which kind of
>  processor to use for a given document is outside the scope
>  of this Recommendation.

May I suggest that you add expand that to say that the presence of a 
construct that matches the ’doctypedecl’ production does not count as a 
”trigger” that requires XML parsers to enable validation mode?

Please consider that there are can be two meanings of 
”trigger”/”subject to”. One is that the presence of the DOCTYPE cause 
the XML processor to jump into validator mode. We are in firm 
agreement, is seems, that the DOCTYPE is not such a trigger. And I 
welcome the proposed emphasizing that it isn’t such a trigger.    

The other meaning of trigger/subject to is where we disagree, 
presently. You have upheld the view that the very presence of a 
construct that matches the doctypedecl production allows a validating 
processor to check for and report validity constraints. If I 
understood, your justification is that a doctypedecl without a DTD 
constrains the document from containing valid elements/attributes/etc. 
But how can a document that is clearly ”not valid” be subject to 
validity constraints? And what ”class of documents” does such a 
document make up? Let us keep in mind the *purpose* of issuing document 
type declarations, mamely to contain or point ”to markup declarations 
that provide a grammar for a class of documents”!

I insist that what should trigger a validating processor to check for 
the validity constraints is that the doctypedecl points to or contains 
a non-empty DTD and that the document matches that non-empty DTD.

For URLs, we have the concept of empty URL. For DTDs, we do not have 
the concept of an empty DTD. And I fail to see how an empty grammar is 
different from no grammar. Perhaps we can best compare it with 
true/false in programming languages: No grammar should always evaluate 
to false, and should thus prevent the validating processor from 
reporting validity constraint errors as it is impossible to comply with 
a grammar that always evaluates to false.

> We realize this still leaves unanswered the issue of how
> to decide if a document should be "subject to validation".
> At the present time at least, that issue is not addressed
> by the XML Recommendation.
> 
> Paul Grosso
> for the XML Core WG
> 
> 
> [1] http://www.w3.org/TR/REC-xml/#dt-valid

> [2] http://www.w3.org/TR/REC-xml/#dt-vc

> [3] http://www.w3.org/TR/REC-xml/#elementvalid

> [4] http://www.w3.org/TR/REC-xml/#ValueType

> [5] http://www.w3.org/TR/REC-xml/#proc-types

-- 
leif halvard silli

Received on Thursday, 6 February 2014 06:56:41 UTC