W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > June 2010

[Bug 9958] New: The DOCTYPE paragraph must explain and define the DOCTYPE rules better and more generally

From: <bugzilla@jessica.w3.org>
Date: Sun, 20 Jun 2010 04:16:14 +0000
To: public-html-bugzilla@w3.org
Message-ID: <bug-9958-2486@http.www.w3.org/Bugs/Public/>

           Summary: The DOCTYPE paragraph must explain and define the
                    DOCTYPE rules better and more generally
           Product: HTML WG
           Version: unspecified
          Platform: All
               URL: http://dev.w3.org/html5/html-xhtml-author-guide/html-x
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
        AssignedTo: eliotgra@microsoft.com
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html@w3.org,

Current definition:

A polyglot document uses the <!DOCTYPE html> doctype. Note that for a polyglot
document the string, html, must be lower case. For a pure HTML document, the
string is defined as case-insensitive.

        New, proposed replacement text (justification follows below):

In polyglot markup, a doctype that ensures that the browser makes a best-effort
attempt at following the relevant specifications, is REQUIRED for
HTML-compatibility. The doctype MUST also be XML compatible, which means that
it has to follow XML’s casing rules. Thus — in contrast to pure HTML
documents — for an HTML-compatible XHTML document, it is REQUIRED:

* that the string <code>DOCTYPE</code> is in uppercase;
* that the string <code>html</code> is in lowercase (because it represents the
root element);
* that the string <code>SYSTEM</code> — if present — is in uppercase;
* that the string <code>PUBLIC</code> — if present — is in uppercase;
* that an FPU — if present — is a case-sensitive match of the registered
FPU that is meant;

In addition, a URI, if present in the doctype, must point to the resource that
is intended. Altering the case of the URI could make it point to a another
resource than the intended one. The requirement that the URI is correct is
equal in both HTML and XML, even if the effect on parsing — in HTML versus
XML — if the URI is incorrect, possibly differ: 

* if the URI is the string <code>about:legacy-compat</code>, the string MUST be
in lowercase, as required by HTML5.
* if the URI is a http URL, the URI must point to the correct resource. 

So if an HTML polyglot contains the HTML5 doctype, then it must appear in the
form <!DOCTYPE html>, case-sensitively. If a HTML polyglot contains the
alternative HTML5 <code>about:legacy-compat</code> doctype, then it must be
<!DOCTYPE html SYSTEM "about:legacy-compat"> or <!DOCTYPE html SYSTEM
'about:legacy-compat'>, case-sensitively. 

If an HTML polyglot contains one of the XHTML doctype that HTML5 describes as
obsolete but still HTML5 compatible (currently they are XHTML 1.0 Strict or
XHTML 1.1), then  it MUST be used in an XML-compatible way, as described above.
An HTML polyglot may use any other XHTML doctype with a referenced DTD, if it
has the same best-effort effect on HTML5-parsers as <!DOCTYPE html> has (in
particular it must trigger strict mode).  However, note, that by using a
DOCTYPE which references a DTD, the document is subjected to follow the rules
of the DTD, and that the rules of the DTD may or may not be compatible with
HTML5 based polyglot markup.

Note that doctypes for HTML4, HTML3 or HTML2, are forbidden in HTML-compatible
XHTML documents, regardless of whether they contain a URI or not and regardless
of their effect in HTML5 parsers, as they are not XHTML compatible.

The suggested replacement text solves the following problems:

1) HTML5 actually operates with *two* doctypes: <!DOCTYPE html> and <!DOCTYPE
html SYSTEM "about:legacy-compat"> – whereas current text in the polyglot
draft appears to say that only <!DOCTYPE html> is valid.
2) The polyglot spec should define more generall rules – as HTML5 itself
does (within its limits). That way, one can also open up for more doctypes than
HTML5 mentions - as the new text does. 
3) The old text does not describe all the requirements of the DOCTYPE. E.g. it
omits that the string 'DOCTYPE' must be uppercase - and so on. And it doesn't
explain *why* the 'html' string must be lowercase.
4)  The last sentence - "For a pure HTML document ..." feels unnecessary. Also,
it would be just as natural to mention that pure XHTML does not need a doctype.
The polyglot spec in fact defines a HTML-compatible *XHTML format*. And thus,
it is in fact more natural to explain why there must be a doctype. The new text
explains this - however it does so as briefly as possible.
5) The effect of HTML4 doctypes once came up in the HTMLWG – and since HTML5
says that one some of them are compatible, the polyglot spec shoudl say that
theyar not polyglot markup compatible.

Note, that the first sentence is in the new text is a direct quote from HTML5:
"ensures that the browser makes a best-effort attempt at following the relevant

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 20 June 2010 04:16:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 20 June 2010 04:16:21 GMT