- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Tue, 21 Jan 2014 18:27:29 +0000
- To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Cc: Jirka Kosek <jirka@kosek.cz>, xml-editor@w3.org
Leif Halvard Silli writes: > A document that lacks DTD is simply ”not valid” > <http://www.w3.org/TR/REC-xml/#sec-prolog-dtd>. And, as not valid, > whether it has validation errors is a question that is out of the > question. I presume you're referring here to these lines near the beginning: [Definition: XML documents SHOULD begin with an XML declaration which specifies the version of XML being used.] For example, the following is a complete XML document, _well-formed_ but not _valid_: <?xml version="1.0"?> <greeting>Hello, world!</greeting> and so is this: <greeting>Hello, world!</greeting> [emphasis in original] It's not *valid*, but it's not *invalid* either: XML provides a mechanism, the document type declaration, to define constraints on the logical structure and to support the use of predefined storage units. [Definition: An XML document is *valid* if it has an associated document type declaration and if the document complies with the constraints expressed in it.] Each of your examples, i.e. <!DOCTYPE html> <html/> and <!DOCTYPE html SYSTEM "about:legacy-compat"> <html/> clearly does have an "associated document type declaration", and equally clearly contain "failures to fulfill the validity constraints given in this specification" [1], so I conclude they are not only not valid, but invalid (although that, interestingly, is not a term defined in the spec. What we find at [1] is an obligation on *validating processors* to _report_ "failures to fulfill the validity constraints given in this specification".) The validity constraint they both fail to fulfill is VC: Element Valid [2], which requires a declaration for every element in a document. It's unfortunate that the definition of *valid* is less explicit than the definition of conforming validating processor, but my guess is that the way the Core WG is most likely to fix that is by making the definition of *valid* stronger, not by making the Conformance section weaker. It would be possible to expand the definition of *validating processors* to be clearer about their responsibilities in the absence of a document type declaration, and that might be a good idea. It would also probably be a good idea to clarify that as things stand <!DOCTYPE html> <html/> is, using the usual convention, _invalid_, where <html/> is neither valid _nor_ invalid, and to provide a definition of 'invalid' as "given a document type declaration, violating one or more of the constraints expressed by the declarations in the DTD, and failing to fulfill one or more of the validity constraints given in this specification". But to take account of the behaviour you cite of xmllint, likewise of rxp, (which treat the two cases above, and the even simpler <html/> case, all as instances of an idiosyncratic validity error w/o precedent in the XML spec.), we would have to define what it meant to have an _empty_ document type declaration, which would be rather more difficult, and potentially backward incompatible. Consider, for example <!DOCTYPE html []> <html/> which causes both report the 'ordinary' undeclared element error, but xmllint to cmplain of a missing DTD. Note also that <!DOCTYPE html> <hmtl/> _is_ invalid, and we wouldn't want to lose that. . . ht [1] http://www.w3.org/TR/REC-xml/#sec-conformance [2] http://www.w3.org/TR/REC-xml/#elementvalid -- Henry S. Thompson, School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Tuesday, 21 January 2014 18:28:00 UTC