[Bug 20993] New: XHTML5 syntax should require valid DOCTYPE declaration (<!DOCTYPE root> should match root element)

https://www.w3.org/Bugs/Public/show_bug.cgi?id=20993

            Bug ID: 20993
           Summary: XHTML5 syntax should require valid DOCTYPE declaration
                    (<!DOCTYPE root> should match root element)
    Classification: Unclassified
           Product: HTML WG
           Version: unspecified
          Hardware: PC
               URL: http://www.w3.org/html/wg/drafts/html/master/the-xhtml
                    -syntax.html#writing-xhtml-documents
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec
          Assignee: dave.null@w3.org
          Reporter: xn--mlform-iua@xn--mlform-iua.no
        QA Contact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-admin@w3.org,
                    public-html-wg-issue-tracking@w3.org

XHTML5 sections says about the DOCTYPE declaration:

]] XML documents may contain a DOCTYPE if desired, but this is not required to
conform to this specification. This specification does not define a public or
system identifier, nor provide a formal DTD. [[

PROBLEM: While this spec does not define identifiers, XML in fact says that the
name section of the DOCTYPE declaration (<!DOCTYPE name>) should match the root
element type (that is: the element that is defined in the syntax rules as the
root element). 

         The word 'DOCTYPE' can cover both DOCTYPE declarations (such as
<!DOCTYPE html> as well as <!DOCTYPE html SYSTEM "URL"> etc) on one side, and
DTDs (DOCTYPE definitions), which are referenced via the public or system
identifier inside a DOCTYPE declaration), on the other side. Hence it is
possible to read the above to say that e.g. <!DOCTYPE IAmCool> (which is a
DOCTYPE)  is fully conforming XHTML5.  

         And, in fact the NU validator currently blesses <!DOCTYPE IAmCool>
when used inside XHTML.

Therefore, in addition to the above, the spec should add that *if* a DOCTYPE
declaration is used for a XHTML document (that is: for a XML document that
begins with the html root element in the XHTML namespace, then authors are
required to make sure that the root name inside the DOCTYPE declaration matches
the name of the XHTML document. In other words, authors must verify that the
DOCTYPE begins '<!DOCTYPE html', if that is (literally) how the root element
begins, or, the root element is prefixed with myprefix, then the DOCTYPE must
match that (which means that there must be a DTD somewhere which defines the
xml:myprefix "attribute"): 
<!DOCTYPE myprefix:html [<!ATTLIST html xmlns:myprefix CDATA
"http://www.w3.org/1999/xhtml"><!--Yes, the xmlns:myprefix must be
declare-->]>.

A consequence of this rule is that XHTML5 validators must check that the
DOCTYPE declaration is <!DOCTYPE html>.

Some XHTML5 validators already behave this way, for instance the XHTML5
validator that is built into the OXygen XML editor (which in turn FWIW
implements Xerces), cries out if the <!DOCTYPE html> and the root element
aren't in sync.


JUSTIFICATIONS:

(1) That the root name of the DOCTYPE has to match the root element (including
the namespace prefix, if there is one) is something that follows more or less
literally from XML 1.0 - it is implied when using a DTD. As such, this is in
line with the preceding paragraph of HTML5, which says:

]] This specification does not define any syntax-level
   requirements beyond those defined for XML proper.[[

(2) By adding this, we avoid that authors do <!DOCTYPE ILoveXHTML> and other
pointless "demonstrations/distractions" with the DOCTYPE. 

(3) We send a signal that plays in positive with regard compatibility with the
text/html serialization, since *if* the DOCTYPE is used, then it will be HTML
compatible. (This is not 100% true, if the DOCTYPEs triggers Quirks Mode.
However, amongst the XHTML doctypes, none of them seem to trigger quirks.
Almost standards mode is the furthest we deviate from no-quirks.)

(4) Yes, DTD-less DOCTYPE declarations are not subject to XML 1.0 DTD-validity
concept. However, since well-formed documents can also be checked via XML
schemas etc, it makes sense to restrict DTD-less DOCTYPEs to what XML 1.0
restricts them for, namely for declaring the root element. (Note that XML 1.0
also has a few rules that doesn't fall under whether validity nor well-formed.)

(5) There are already many validity things that are checked when the NU
validators performs XHTML5 checking: It checks that the root element is <html>
(yes, it could be <h:html xmlns:h="http://www.w3.org/1999/xhtml">, but in a
XHTML document, the root has to be the 'html' element! And that the root must
be <html>, is a validity concept - it is not a well-formed concept. (And there
are many, many XHTML5 conformance checks that are validy issues and not
well-formed issues. And thus, since the validity concept is involved in this
(and other) aspects of _XHTML5_ conformance checking, and since many of those
rules are there in order to assure interoperability between HTML5 and XHTML5,
it seems logical to also include DOCTYPE validity checking as part of XHTML5.

(6) These rules make it difficult to fake and difficult to be "advanced". But
keeps it simple to be simple - to use simple DOCTYPE declarations.

NOTE 1: This bug does not say that anything should change with regard to
parsing of XHTML, invalid DOCTYPE declaration will continue to bother no one,
except DTD-validating processors (such as e.g. XML editors).
NOTE 2: This bug does not propose to *require* the use of the DOCTYPE
declaration in XHTML - it only defines how it should be used when or if it is
used.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Thursday, 14 February 2013 09:53:15 UTC