Re: Doctypes and the dialects of HTML 5

Daniel Schattenkirchner wrote:

> from an authors point of view I was wondering how HTML5 will handle
> doctypes (I hope we all know why they are important).

They were important in SGML era, where there were no namespaces and
there was only DTD validation.

I hope that HTML5 (or whatever else name it will have) will made
!DOCTYPE optional (at least for XML serialization).

> Even if Web Applications 1.0 becomes HTML5 I don't think it can keep
> "<!DOCTYPE html>" because it probably needs versioning in it. The public

Yep, versioning is necessary but !DOCTYPE is completely insufficient for
versioning purposes. HTML already offers different way of specifying
version used -- profile attribute on head element. But this is not very
known feature and it is rarely used.

What will be more suitable is version attribute allowed on root element
(html) and also on other elements which can act as roots of HTML
fragments (e.g. div). So for specifying that you are using HTML 5.0 you
could write:

<html version="5.0">
 ...
</html>

Bellow I'm attaching snippet of article which will be presented at XTech
(http://2007.xtech.org/public/schedule/detail/48) and which describes
some problems of using !DOCTYPEs for versioning purposes. I hope that it
will make sense even if it is pasted without context, but as this list
is public I can't put whole article here before it is published in
conference proceedings.

===========

5.1. Namespace is not a document type

Its quite common misconception that for each namespace there is a single
schema defined somewhere. This assumption might hold for some simpler
specialized XML based languages, but for many languages used on the Web
namespace works just as a basic semantic identification.

There are very often multiple different variants of vocabulary in the
particular namespace. These vocabularies could be subsets of the “base”
language—for example, this is a case of XHTML 1.0 Transitional and its
derivates like XHTML 1.0 Strict, XHTML Basic or XHTML Print. The second
case is newer version of vocabulary which does not change meaning of
original elements so there is no need to change namespace. Both XSLT 1.0
and XSLT 2.0 share the same namespace, but XSLT 2.0 defines dozen of new
elements and attributes, it even changes content model of some elements.
Similar situation is true also for XHTML—XHTML 1.1 defines several new
elements for Ruby annotations.

5.2. Versioning namespaces

Several different approaches for recognizing document types in a single
namespace are in a common use. One of the easiest is usage of dedicated
attribute for holding version information. This is case for example of XSLT.

Example 4. Version information inside XSLT 2.0 stylesheet
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  ...
</xsl:stylesheet>


This is almost ideal way of conveying version information. Attribute
value can be easily accessed in almost all processing tools. What is
even more important you can embed XSLT into other XML vocabulary and you
are still able to identify a version of XSLT used by using the version
attribute.

The only problem is that XSLT allows versioning attribute only on a top
element of a stylesheet. So you are unable to extract for example one
template from stylesheet and add versioning information to this template.

XHTML uses legacy way of specifying versioning information which is
depending on presence of a document type declaration (!DOCTYPE) at the
start of document.

Example 5. Version information in XHTML document
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML-Print 1.0//EN"
  "http://www.w3.org/MarkUp/DTD/xhtml-print10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  ...
</html>


Strictly speaking document type declaration is not version indication it
is just reference to DTD which can be used for validation and definition
of entities used. But public or system identifier could be used as a
version identifier albeit quite long and verbose.

Unfortunately document type declaration can be only at the start of XML
document. It can not be embeded in a middle of XML document—this
disqualifies it from being used in the Web of compound documents. This
for example means that you can not embeded XHTML page into SOAP message
and identify version of XHTML used.

Moreover current specifications of several XHTML flavours (for example
XHTML Basic and XHTML Print) make public identifier optional and allows
specification of a private system identifier as long as it points to a
copy of original DTD. This means that in order to reliably detect
version of XHTML used, you have to download DTD, normalize line-end
characters inside it and then compare it to one of original DTDs
provided by W3C as a part of respective specification. It is evident
that such process is overkill. Moreover request for download of private
copy of DTD could be misused as attack against Web agent—this DTD could
be very long or it could use a big amount of entity declarations to
congest XML parser.

There is also not very well known feature of XHTML that could be used
for specifying version information instead of document type declaration.
It is possible to use the profile attribute on the head element. Profile
identifies particular profile (version, subset) of language used and it
has form of URI.

Example 6. More robust way of labeling document as XHTML Print
<html xmlns="http://www.w3.org/1999/xhtml">
  <head
    profile="http://www.w3.org/Markup/Profile/Print">
  ...
  </head>
  ...
</html>


Again, profile attribute is not a perfect solution—it can be specified
only on head element and thus can not be used for specifying flavour of
XHTML used for just a small fragment of XHTML code.

Previous examples show very sad conclusion that the current state of XML
vocabularies and their specifications were not designed in order to make
it possible to fully exploit possibilities of compound documents. We
think that W3C should extend current Web architecture [WEBARCH] and
update older specifications to support robust and flexible way of
attaching version information to arbitrary fragments of XML
vocabularies. It seems that allowing version or similar attribute on all
elements which can be used as root elements of XML fragments is simple
and sufficient solution. At the same time a document type declaration
should be made an optional part of a conforming document.

======

Dan, if you are reading this could you please add problem of versioning
to the list of issues? Thanks.

   Jirka

-- 
------------------------------------------------------------------
  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO/JTC1/SC34 member
------------------------------------------------------------------
 Want to speak at XML Prague 2007 => http://xmlprague.cz/cfp.html

Received on Saturday, 24 March 2007 19:18:47 UTC