Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

The advent of HTML5/XHTML5 has made documents with a DOCTYPE without a 
DTD popular.

However, some XML tools reports validation constraint errors for 
documents with the HTML5 doctype. This happens because the very HTML5 
DOCTYPES apparently causes some tools to dip into DTD validation mode - 
and subsequently report all elements and attributes as an error, since 
none of them are defined in the (non-existing) DTD. This may happen 
even if the tool supports more useful conformance checking means, such 
as XSD schemas etc. Thus, it happens despite that it would have been 
more fruitful to go into e.g. XSD conformance checking mode (or simply 
just check well-formedness).

When trying to discuss this behavior when XML tools developers, it 
would be helpful to have an authoritative statement to point to.

Therefore, my proposal is to extract rules or guidance for what to do 
when the DOCTYPE declaration points to no markup declaration and place 
this into the 6th edition of XML. (Or to put it differently: define 
what to do when the DOCTYPE lacks an internal or external DTD.) 

XML 1.0 fifth edition says:

“[Definition: An XML document is valid if it has an associated document 
type declaration and if the document complies with the constraints 
expressed in it.]”

Question: But which constraints does a document type declaration 
without an internal or external DTD express?  

Answer: "no restriction", because document type declarations are 
defined to contain markup declarations, something which none of the two 
HTML5 doctypes (<!DOCTYPE html SYSTEM "about:legacy-compat"> and 
<!DOCTYPE html>) contain. Simply put, since the HTML5 doctypes contains 
no ”element type declaration, an attribute-list declaration, an entity 
declaration, or a notation declaration”, they should not be seen as 
markup declarations, from validating xml processor’s point of view:

“[Definition: The XML document type declaration contains or points to 
markup declarations that provide a grammar for a class of documents. 
This grammar is known as a document type definition, or DTD. The 
document type declaration can point to an external subset (a special 
kind of external entity) containing markup declarations, or can contain 
the markup declarations directly in an internal subset, or can do both. 
The DTD for a document consists of both subsets taken together.]”

“[Definition: A markup declaration is an element type declaration, an 
attribute-list declaration, an entity declaration, or a notation 
declaration.] These declarations may be contained in whole or in part 
within parameter entities, as described in the well-formedness and 
validity constraints below. For further information, see 4 Physical 
Structures.“

By the way: the spec contains several examples of simple documents to 
which validity applies. And it would be good to includes  examples of 
documents where teh doctype does not reference a markup declaration.

I may provide verbatim spec text change proposals, if this would be 
useful.
-- 
leif halvard silli

Received on Sunday, 19 January 2014 20:29:47 UTC