W3C home > Mailing lists > Public > xml-editor@w3.org > January to March 2014

Re: Clarify that documents with DOCTYPE but without markup declaration are not subject to validation

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 21 Jan 2014 18:27:29 +0000
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: Jirka Kosek <jirka@kosek.cz>, xml-editor@w3.org
Message-ID: <f5bd2jltcku.fsf@troutbeck.inf.ed.ac.uk>
Leif Halvard Silli writes:

> A document that lacks DTD is simply ”not valid”
> <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid,
> whether it has validation errors is a question that is out of the
> question.

I presume you're referring here to these lines near the beginning:

  [Definition: XML documents SHOULD begin with an XML declaration
  which specifies the version of XML being used.] For example, the
  following is a complete XML document, _well-formed_ but not _valid_:

  <?xml version="1.0"?>
  <greeting>Hello, world!</greeting> 

  and so is this:

  <greeting>Hello, world!</greeting>

  [emphasis in original]

It's not *valid*, but it's not *invalid* either:

  XML provides a mechanism, the document type declaration, to define
  constraints on the logical structure and to support the use of
  predefined storage units. [Definition: An XML document is *valid* if
  it has an associated document type declaration and if the document
  complies with the constraints expressed in it.]

Each of your examples, i.e.

  <!DOCTYPE html>
  <html/>
and
  <!DOCTYPE html SYSTEM "about:legacy-compat">
  <html/>

clearly does have an "associated document type declaration", and equally
clearly contain "failures to fulfill the validity constraints given in
this specification" [1], so I conclude they are not only not valid,
but invalid (although that, interestingly, is not a term defined in
the spec.  What we find at [1] is an obligation on *validating
processors* to _report_ "failures to fulfill the validity constraints
given in this specification".)

The validity constraint they both fail to fulfill is VC: Element Valid [2],
which requires a declaration for every element in a document.

It's unfortunate that the definition of *valid* is less explicit than
the definition of conforming validating processor, but my guess is
that the way the Core WG is most likely to fix that is by making the
definition of *valid* stronger, not by making the Conformance section
weaker.

It would be possible to expand the definition of *validating
processors* to be clearer about their responsibilities in the absence
of a document type declaration, and that might be a good idea.

It would also probably be a good idea to clarify that as things stand

  <!DOCTYPE html>
  <html/>

is, using the usual convention, _invalid_, where

  <html/>

is neither valid _nor_ invalid, and to provide a definition of
'invalid' as "given a document type declaration, violating one or more
of the constraints expressed by the declarations in the DTD, and
failing to fulfill one or more of the validity constraints given in
this specification".

But to take account of the behaviour you cite of xmllint,
likewise of rxp,
(which treat the two cases above, and the even simpler
 <html/>
case, all as instances of an idiosyncratic validity error w/o
precedent in the XML spec.), we would have to define what it meant to
have an _empty_ document type declaration, which would be rather more
difficult, and potentially backward incompatible.

Consider, for example

  <!DOCTYPE html []>
  <html/>

which causes both report the 'ordinary' undeclared element error, but
xmllint to cmplain of a missing DTD.

Note also that

  <!DOCTYPE html>
  <hmtl/>

_is_ invalid, and we wouldn't want to lose that. . .

ht

[1] http://www.w3.org/TR/REC-xml/#sec-conformance
[2] http://www.w3.org/TR/REC-xml/#elementvalid
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Tuesday, 21 January 2014 18:28:00 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 21 January 2014 18:28:06 UTC