W3C home > Mailing lists > Public > xmlschema-dev@w3.org > February 2013

Re: Can root name in DOCTYPE be a XSD-validity thing?

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 14 Feb 2013 12:12:14 -0700
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, xmlschema-dev@w3.org
Message-Id: <3689AA6D-EE23-43F1-8890-3A5941B4B097@blackmesatech.com>
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>

On Feb 13, 2013, at 11:13 PM, Leif Halvard Silli wrote:

> Hello schema devs. 
> Can a XSD-based processor validate that a XHTML5 document contains the 
> HTML5 DOCTYPE type declaration? And can it do so without resorting 
> hacks? Can it be expressed in the XSD-language itself, or must the 
> processor performa an initial 'DTD-mode' check?

No.  Conforming XSD validators are not required to be sensitive to,
or aware of, the infoset's document type declaration information item,
and XSD provides no language for referring to properties of that
information item. 

> Note: The HTML5 DOCTYPE *declaration* doesn't reference a DOCTYPE 
> *definition*, so I do not ask if XSD can validate the 'ExternalID' or 
> 'intSubset' part of the DOCTYEP declaration. I only wonder if XSD 
> allows checking that the 'name' part of the DOCTYPE matches the 'root 
> element type'. Since it is well-formed to include DOCTYPE declaration 
> without a DTD, and since XSD moves the validation from DTD to, well 
> XSD, it ought, it seems, be possible to use XSD to verify that the 
> DOCTYPE declaration contains the name of the root element (the same way 
> that XSD allows checking many things that, per XML 1.0, does not fall 
> under XML 1.0’s validity concept).

You are quite right that there would be no logical contradiction in 
a schema language spec that allowed the kind of check you have in
mind.  But no, it's not something XSD is designed to constrain, or to 
allow schema authors to constrain.

> Background for my question is that XML editors/processors nowadays tend 
> to be schema based, even when a DTD is references. And hence, if XSD 
> doesn’t validate the DOCTYPE *declaration*, XSD could in fact label 
> constructs like this as 'validating':
>  <!DOCTYPE NotTheHtmlRootElement>
>  <html xmlns="http://www.w3.org/1999/xhtml"><head><title></title>
>  </head><body><p/></body></html>
> And in fact, it seems most XSD processor do say that the above is 
> validating. For example Libxml2 in XSD mode does. Just ask xmllint to 
> apply the XHTML 1.1 XSD[1] to  a XHTML 1.1 doc with '<DOCTYPE HTML' 
> instead of '<DOCTYPE html'[2]:
>  xmllint --schema http://tinyurl.com/a9lrvfq http://tinyurl.com/bkhk86p
> Even the well known Oxygen editor, which (by default) uses Xerces, does 
> in the abvoe case behave like Libxml2. 

The agreement or disagreement of the DOCTYPE declaration with
the name on the outermost element is not a property that affects the
schema-validity of any document with respect to any XSD schema. 
So I'm happy to believe that these programs behave as you say, and
to say that their failure to object to the discrepancy between doctype
declaration and instance has no relation to their conformance or
non-conformance as XSD validators.

> ...
> By the way: If we remove the XSD specific attributes from the markup of 
> the above test document, then Oxygen (Xerces) does in fact complain 
> that the root element doesn't match the document type declaration. And 
> it does so *without* complaining that the rest of the markup is 
> invalid. Is this because oXygen uses a XSD schema that includes DOCTYPE 
> validation? Or is it because it uses "DTD mode" for the DOCTYPE, and 
> then XSD mode for the rest of the document?

I think the answer is that Oxygen's algorithm for deciding how to validate 
a document, when validation is requested, will ignore a DOCTYPE
declaration if an xsi:schemaLocation hint is present.  (But I haven't
tested this hypothesis; I just know that Oxygen makes its best effort to
do something useful.)

> SOME "PHILOSOPHIC" QUESTIONS: According to XML 1.0, "validity 
> constraints" apply to all valid documents. Does XML by this mean "all 
> DTD-valid documents", noly? This is relevant since XML says that it is 
> a validity constraint that the DOCTYPE declaration matches the root. 
> Thus, it seems to me that if this validity constrain does *not* apply 
> to XSD, then XSD processor are, in fact, not validating processors. 
> (I.e. "validating processor" per XML 1.0, then means "DTD-validating 
> processor".) If so, then in one way, it is unlucky that 'validity' is 
> used by anything other than DTD-validity.

The only form of validity defined in the XML specification is DTD-based
validity.  So yes, the validity constraints of that spec relate to validation
against a DTD, and no, XSD validators are not 'validating processors'
within the meaning of XML spec.

The prose of the XSD spec generally tries to use the term 'schema-validity' 
for the property that an XSD validator checks; there was some fear during 
the drafting of XSD 1.0 that using the unqualified term "validity" would 
confuse people.  The intervening years have shown that people are quite
happy to use the term "validity" in a broader sense and to distinguish
DTD-validity, RNG-validity, XSD-validity, etc. as needed.  So some readers
have felt that the XSD spec's consistent use of "schema-validity" instead
of just "validity" was an unnecessary affectation.  In future I shall refer 
such comments to your message; you are the first reader I have encountered
to support the XSD spec's choice of terminology.

> PS: I note with interest that the XSD schema files for the XSD language 
> itself, themselves (some of them) include document type declarations. 
> And so I wonder: What would happen if one altered the root names  
> declared by the document type declaration in those XSD documents? Would 
> XSD-based processor stop working … ?

Most XSD processors have hard-coded knowledge of the schema
for schema documents as specified in the XSD spec; it would be an
unusual processor that actually read the schema document for
schema documents on each startup.  (Not an impossible one, just
unusual.)  Processors with hard-coded knowledge of the schema for
schema documents would, I guess, be unaffected by a textual change
to the schema documents in the spec.  A processor that did read the
schema document for schema documents at startup might cease to work
if it checked the schema documents for DTD-validity and found that
the root element had the wrong name, but it is not (unless I am forgetting
something) a requirement of the XSD spec that XSD schema documents
be DTD-valid.

Thank you for your thoughtful and though-provoking message. 

I hope this helps.

* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
Received on Thursday, 14 February 2013 19:12:44 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:56:21 UTC