Can root name in DOCTYPE be a XSD-validity thing?

Hello schema devs. 

Can a XSD-based processor validate that a XHTML5 document contains the 
HTML5 DOCTYPE type declaration? And can it do so without resorting 
hacks? Can it be expressed in the XSD-language itself, or must the 
processor performa an initial 'DTD-mode' check?

Note: The HTML5 DOCTYPE *declaration* doesn't reference a DOCTYPE 
*definition*, so I do not ask if XSD can validate the 'ExternalID' or 
'intSubset' part of the DOCTYEP declaration. I only wonder if XSD 
allows checking that the 'name' part of the DOCTYPE matches the 'root 
element type'. Since it is well-formed to include DOCTYPE declaration 
without a DTD, and since XSD moves the validation from DTD to, well 
XSD, it ought, it seems, be possible to use XSD to verify that the 
DOCTYPE declaration contains the name of the root element (the same way 
that XSD allows checking many things that, per XML 1.0, does not fall 
under XML 1.0’s validity concept).

Background for my question is that XML editors/processors nowadays tend 
to be schema based, even when a DTD is references. And hence, if XSD 
doesn’t validate the DOCTYPE *declaration*, XSD could in fact label 
constructs like this as 'validating':

  <!DOCTYPE NotTheHtmlRootElement>
  <html xmlns="http://www.w3.org/1999/xhtml"><head><title></title>
  </head><body><p/></body></html>

And in fact, it seems most XSD processor do say that the above is 
validating. For example Libxml2 in XSD mode does. Just ask xmllint to 
apply the XHTML 1.1 XSD[1] to  a XHTML 1.1 doc with '<DOCTYPE HTML' 
instead of '<DOCTYPE html'[2]:

  xmllint --schema http://tinyurl.com/a9lrvfq http://tinyurl.com/bkhk86p


Even the well known Oxygen editor, which (by default) uses Xerces, does 
in the abvoe case behave like Libxml2. 

Now, if we move focus to processors that only consider DTD-validity, 
then both the rxp processor and Libxml2 in DTD mode reports, for the 
above document, that validation could not be performed due to lack of 
DTD: 

           rxp -V http://tinyurl.com/bkhk86p

  xmllint --valid http://tinyurl.com/bkhk86p


However, in the presence of a (supported) SYSTEM ID/URI, the some 
processor are triggered into DTD-validation mode even if the ID/URI 
leads nowhere, meaning that they check that the root element matches 
root name in the document type declaration. Libxml2 (xmllint) does in 
fact behave that way (even if it also, in that case, reports that the 
root elements hasn’t been delcared.)

By the way: If we remove the XSD specific attributes from the markup of 
the above test document, then Oxygen (Xerces) does in fact complain 
that the root element doesn't match the document type declaration. And 
it does so *without* complaining that the rest of the markup is 
invalid. Is this because oXygen uses a XSD schema that includes DOCTYPE 
validation? Or is it because it uses "DTD mode" for the DOCTYPE, and 
then XSD mode for the rest of the document?

SOME "PHILOSOPHIC" QUESTIONS: According to XML 1.0, "validity 
constraints" apply to all valid documents. Does XML by this mean "all 
DTD-valid documents", noly? This is relevant since XML says that it is 
a validity constraint that the DOCTYPE declaration matches the root. 
Thus, it seems to me that if this validity constrain does *not* apply 
to XSD, then XSD processor are, in fact, not validating processors. 
(I.e. "validating processor" per XML 1.0, then means "DTD-validating 
processor".) If so, then in one way, it is unlucky that 'validity' is 
used by anything other than DTD-validity.

RELATED COMMENTS:

1) On one side, There seems to be confusion within the XML community 
about the HTML5 doctype declaration. For example the XMLmind xml editor 
developers (which has developed as XSD for XHTML5) had let themselves 
convince that the HTML5 doctype was not well-formed - only when I made 
them aware of Polyglot Markup, did they realize that it was well-formed 
to include the HTML5 DOCTYPE.[3] 

2) And on the other - but related(sic!) - side, we have the fact that 
HTML5 doesn’t declare any "official" XHTML5 doctype declaration, which 
is a reflection of the general, sceptic attitude in the XML community 
these days, towards document type declarations and document type 
definitions. Which more specifically is related to a focus on 
well-formed as "good enough" (not to say "difficult enough") on one 
side, as well as a focus on "better" methods for validations - namely 
XML schemas. 

3) But perhaps HTML5 goes a little bit too far, right now: Surely HTML5 
could at least say that, if a DOCTYPE declaration is used, then the 
'name' part of the 'doctypedecl' should match the root element of the 
docuement?!?

FINALLY: My focus with this letter, is XML processor’s ability to 
validate/create XHTML5 document that, as far as the *DOCTYPE 
declaration* is concerned, are polyglot.

PS: I note with interest that the XSD schema files for the XSD language 
itself, themselves (some of them) include document type declarations. 
And so I wonder: What would happen if one altered the root names  
declared by the document type declaration in those XSD documents? Would 
XSD-based processor stop working … ?

[1] http://tinyurl.com/as3k455

[2] http://tinyurl.com/a9lrvfq

[3] 
http://www.xmlmind.com/pipermail/xmleditor-support/2013-January/010268.html

-- 
leif halvard silli

Received on Thursday, 14 February 2013 06:14:09 UTC