Re: Doctypes and the dialects of HTML 5 from Henri Sivonen on 2007-03-25 (public-html@w3.org from March 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 25 Mar 2007 12:16:51 +0300
To: public-html@w3.org
Message-Id: <A9925841-9449-4E5C-B149-EF07E1598735@iki.fi>
On Mar 23, 2007, at 22:55, Daniel Schattenkirchner wrote:

> from an authors point of view I was wondering how HTML5 will handle  
> doctypes

In text/html, you use <!DOCTYPE html>, which activates the standards  
mode in relevant browsers.

In application/xhtml+xml, no doctype is needed, but the spec cannot  
forbid the author from using a doctype, because forbidding it would  
inappropriately tamper with the realm of another layer (XML) in the  
language layer cake.

> (I hope we all know why they are important).

Indeed. They are important for activating the standards mode in  
browsers in the case of text/html.
http://hsivonen.iki.fi/doctype/

> Even if Web Applications 1.0 becomes HTML5 I don't think it can  
> keep "<!DOCTYPE html>" because it probably needs versioning in it.  
> The public "-//W3C//DTD HTML 5.0//EN" comes to my mind.

The doctype <!DOCTYPE html> in the WHATWG spec is not an uninformed  
accident. It is deliberate.

For argumentation against using the public ID as a version  
information switch in XML, please see
http://hsivonen.iki.fi/doctype/#xml

> However, I was actually wondering wether there'll be one doctype  
> for the SGML and XML dialects of HTML5, or one for each dialect,  
> which could result from different naming (XHTML5?).

There is no SGML dialect of HTML5. There's an HTML dialect and an XML  
dialect. (Even the Charter for this group says: "The Group will  
define conformance and parsing requirements for 'classic HTML',  
taking into account legacy implementations; the Group will not assume  
that an SGML parser is used for 'classic HTML'.")

On Mar 24, 2007, at 21:18, Jirka Kosek wrote:

> I hope that HTML5 (or whatever else name it will have) will made
> !DOCTYPE optional (at least for XML serialization).

It's optional for the XML serialization. It cannot be made optional  
in the text/html serialization without making triggering the quirks  
mode conforming, which is not what is wanted.

> HTML already offers different way of specifying version used --  
> profile attribute on head element.

It's for versioning stuff like metadata profiles, and it has failed  
miserably in the marketplace. The attribute is obsolete as of today's  
WHATWG draft.

> What will be more suitable is version attribute allowed on root  
> element
> (html) and also on other elements which can act as roots of HTML
> fragments (e.g. div). So for specifying that you are using HTML 5.0  
> you
> could write:
>
> <html version="5.0">
>  ...
> </html>

I am opposed to requiring authors to include an incantation like that.

First, we need to consider use cases for versioning. I'll go over the  
usual straw men:

1) Versioning is needed so that browsers can switch to a mode needed  
for a particular version.

No. If the quirks mode has taught us anything about this issue with  
HTML, the conclusion should not be that more versioned modes are the  
solution. the conclusion should be that future versions of HTML and  
CSS must not make changes that are incompatible with real legacy  
content. Modes are bad for browser development and quality assurance.  
We shouldn't want to have more. HTML5 including the parsing algorithm  
have carefully been designed so that HTML5 can be implemented in the  
standards mode without breaking existing standards mode content.

(However, if a browser vendor doesn't want to change HTML 4 parsing  
despite the by-design compatibility of the HTML5 parsing algorithm,  
the vendor could use the HTML5 doctype as a parser selection switch.  
But keeping the old parser around is not something that the spec  
should encourage.)

2) Versioning is needed for mobile profiles.

No. HTML5 doesn't and shouldn't have a mobile profile. The concept of  
a mobile profile implies a walled garden world-view. If a browser  
only supports a mobile profile, the browser isn't suitable for the  
Web because the Web will use full (X)HTML5. On the other hand, Opera  
Mini is proof by implementation that the need to profile HTML under  
the pretext of mobile limitations is bogus.

See also http://www.w3.org/2004/04/webapps-cdf-ws/papers/opera.html

3) Versioning is needed to prepare for HTML6.

No. If HTML6 is designed well, no new processing mode is needed and  
HTML5 documents will work in browsers that implement HTML6. If,  
however, whoever designs HTML6 decides to do so badly, HTML6 can add  
a version incantation. HTML5 doesn't need to.

4) Versioning is needed for online conformance checking.

No. First, we need to consider what online conformance checkers are  
for. Do they exist so that third parties can go "Haha! He used the  
target attribute and specified the Strict doctype. What a bozo.  
Clearly, he should have known better and specified the Transitional  
doctype."? I don't think so.

Online conformance checkers are tools for helping with markup  
authoring. Therefore, it is critical to consider their use in the  
time frame of the authoring according a particular version taking  
place. When HTML6 is ready to be deployed, it won't be critical for  
authors to be able to specify in the document if they meant HTML5 or  
HTML6. They should write HTML6 and conformance checker *defaults*  
should be updated accordingly.

If HTML6 is a superset of HTML5, writing HTML5 and checking with an  
HTML6 conformance checker won't be a problem. If HTML6 deprecates or  
obsoletes parts of HTML5, then we won't want to make it too easy for  
people to keep using the bad stuff without mentioning it to them,  
will we?

If someone wants to keep checking against the definitions of HTML5 in  
the era of HTML6, I think it is reasonable put the burden of choosing  
a different version from a pop-up menu in the conformance checker UI  
on the person who wants to do legacy checking.

Compare with CSS.

5) A CMS uses an implementation-specific subset (e.g. no scripting  
and no forms permitted). You want to configure a general-purpose  
authoring tool to limit auto-completion to this subset.

This use case actually has merit. However, it doesn't have merit as a  
reason for requiring all authors to include a version='5' incantation.

Discussing this issue pretty much reduces to the discussion about the  
bogosity of xsi:schemaLocation and about the merits of a PI for  
declaring the location of a RELAX NG schema in a document instance.  
Note that I am not saying that authoring tool auto-completion has to  
be RELAX NG-based. I am just saying that the relevant argumentation  
is the same as with the arguments about how a RELAX NG-aware editor  
decides which RELAX NG schema to use with a particular document.

I think XHTML5 should neither require nor forbid PIs for configuring  
authoring tools. This is between the author and his/her editor and  
leaving the artifact in a file that gets served on the Web is mostly  
harmless.

I am less sympathetic to an attribute on the root element for the  
same purpose, but I'd be willing to concede to an optional attribute  
with user-defined contents for the purpose of use as a hook in  
private authoring workflows. E.g. profile='acme-cms-scriptless-and- 
formless'. However, I am slightly uncomfortable about this, because  
it is like giving the little finger to xsi:schemaLocation.

The contents being user-defined hook for private workflows is an  
important point. Normatively prescribing how you can subset XHTML  
doesn't work. Consider XHTML Mobile Profile. Modularization of XHTML  
was prepared to cater exactly to things like XHTML Mobile Profile and  
then the MP spec went and did not follow the prescribed module  
boundaries anyway.

With the schema project for (X)HTML5, fantasai and I have built in  
some options in the schema for dealing with HTML5 vs. XHTML5  
differences and for catering to subsetting in ways that we foresee as  
reasonable. However, this is entirely non-normative and not endorsed  
by Hixie. If someone is not happy with the options that fantasai and  
I were able to foresee, the schema is editable and forkable. It would  
be pointless to pretend that it weren't.

Since subsetters are going to do their own thing anyway, naming the  
subsets should be user-defined and it would be pointless to try to  
come up with a closed list of de jure subset names.

> Its quite common misconception that for each namespace there is a  
> single
> schema defined somewhere.

Indeed. Online conformance checkers should probably default to the  
broadest feature set they support. For example, allowing embedded SVG  
and MathML by default. (The reason why mine doesn't, yet, is that I  
haven't had time to review the SVG and MathML stuff properly, yet.)

> Several different approaches for recognizing document types in a  
> single
> namespace are in a common use. One of the easiest is usage of  
> dedicated
> attribute for holding version information. This is case for example  
> of XSLT.
>
> Example 4. Version information inside XSLT 2.0 stylesheet
> <xsl:stylesheet
>   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>   version="2.0">
>   ...
> </xsl:stylesheet>

I think XSLT is an example of bad design with versioning.  
(Disclaimer: I am not an XSLT expert. I try to avoid XSLT when I can.)

If you feed an old transformation sheet to SAXON 8, it will just warn  
you that differences between old versions of XSLT and XSLT 2.0 are  
your problem and figuring out if the warning applies to your  
particular transformations sheet is your problem as well. If you are  
unsure, you should use SAXON 6.

So the version attribute doesn't give you old behavior. Downgrading  
the implementation version does. OTOH, the versions are incompatible  
enough for the new version of the engine to issue a warning. If you  
consider XSLT a programming language that you run in your own  
environment, this might be acceptable. However, what works in such an  
environment doesn't work for Web stuff.

> Strictly speaking document type declaration is not version  
> indication it
> is just reference to DTD which can be used for validation and  
> definition
> of entities used.

Indeed.

> This for example means that you can not embeded XHTML page into  
> SOAP message and identify version of XHTML used.

Considering what I said above, versioning XHTML inside SOAP messages  
should not be necessary. Interchange with loosely affiliated or  
unaffiliated parties is similar to the browser use case. And chances  
are you'll hit SOAP versioning incompatibilities first when you try  
to upgrade a SOAP interface. :-)

Personally, I am not particularly keen to design for SOAP, XSD or XSL- 
FO.

> Moreover request for download of private
> copy of DTD could be misused as attack against Web agent—this DTD  
> could
> be very long or it could use a big amount of entity declarations to
> congest XML parser.

I hope that whatever this WG does, it doesn't pretend DTDs to work on  
the Web.
http://hsivonen.iki.fi/no-dtd/

> Example 6. More robust way of labeling document as XHTML Print

FWIW, I think XHTML Print has remarkably little relevance to Web  
content or even authoring in editors.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Sunday, 25 March 2007 09:17:14 UTC