Re: Doctypes and the dialects of HTML 5

Henri Sivonen wrote:

>> What will be more suitable is version attribute allowed on root element
>> (html) and also on other elements which can act as roots of HTML
>> fragments (e.g. div). So for specifying that you are using HTML 5.0 you
>> could write:
>> <html version="5.0">
>>  ...
>> </html>
> I am opposed to requiring authors to include an incantation like that.

I was thinking about optional version information, not mandatory.

> 4) Versioning is needed for online conformance checking.
> No. First, we need to consider what online conformance checkers are for.
> Do they exist so that third parties can go "Haha! He used the target
> attribute and specified the Strict doctype. What a bozo. Clearly, he
> should have known better and specified the Transitional doctype."? I
> don't think so.
> Online conformance checkers are tools for helping with markup authoring.
> Therefore, it is critical to consider their use in the time frame of the
> authoring according a particular version taking place. When HTML6 is
> ready to be deployed, it won't be critical for authors to be able to
> specify in the document if they meant HTML5 or HTML6. They should write
> HTML6 and conformance checker *defaults* should be updated accordingly.
> If HTML6 is a superset of HTML5, writing HTML5 and checking with an
> HTML6 conformance checker won't be a problem. If HTML6 deprecates or
> obsoletes parts of HTML5, then we won't want to make it too easy for
> people to keep using the bad stuff without mentioning it to them, will we?
> If someone wants to keep checking against the definitions of HTML5 in
> the era of HTML6, I think it is reasonable put the burden of choosing a
> different version from a pop-up menu in the conformance checker UI on
> the person who wants to do legacy checking.

I think that you are anticipating that in the future there will be only
one version of HTML (HTML5) which will be then extended and became
HTML6, etc. But even now there are several versions of HTML in use:
Strict, Transitional, Frameset, 1.1, Print, Basic, MP. Future may prove
that this was big mistake and that there should be only one version, who
knows. But we (at least me) do not have time machine. So if some user
wants to use specific version of HTML for whatever reason, he or she
should be able to very easily test whether his/her document is
conforming to version of HTML used.

In my opinion conformance checker should offer at least following
validation options:

- check document against rules for version which is specified in
document (or use the latest version if version is unspecified)

- check document against the latest version

- check document against any arbitrary chosen version

> 5) A CMS uses an implementation-specific subset (e.g. no scripting and
> no forms permitted). You want to configure a general-purpose authoring
> tool to limit auto-completion to this subset.
> This use case actually has merit. However, it doesn't have merit as a
> reason for requiring all authors to include a version='5' incantation.

Version information could be just optional.

> Discussing this issue pretty much reduces to the discussion about the
> bogosity of xsi:schemaLocation and about the merits of a PI for
> declaring the location of a RELAX NG schema in a document instance.

I must strongly disagree with this point. Both schema association PI and
xsi:schemaLocation points to concrete schema written in a particular
schema language. This is very different from specifying just version of
HTML used. Compare:

<html xmlns:xsi=""


<html xmlns:xsi=""



<html version="5.0">


<html version="5.0-subset my-cms-2.0">

The first example specifies concrete schema written in W3C XML schema,
which is neither very flexible (you might want to use completely
different schema language), nor very clever (for various performance and
security reasons you shouldn't fetch schemas from location provided in
document). However later example gives you much more flexibility because
it offers one additional level of indirection. It is upon you or your
application to pick-up correct schema (or processing component, or
whatever) based on content of version attribute.

> I think XHTML5 should neither require nor forbid PIs for configuring
> authoring tools. This is between the author and his/her editor and
> leaving the artifact in a file that gets served on the Web is mostly
> harmless.

Editing is only one issue. But many companies use subset of XHTML as
basic format for creating universal text content. They send this subset
between various components, for editing, proofreading, approving, down
conversion, publishing, ... They need to carry information about version
of subset used and without some HTML provision for version labeling they
have to extend HTML with their own element/attribute which of course
causes problems in many tools that are recognizing only HTML markup.

> I am less sympathetic to an attribute on the root element for the same
> purpose, but I'd be willing to concede to an optional attribute with
> user-defined contents for the purpose of use as a hook in private
> authoring workflows. E.g. profile='acme-cms-scriptless-and-formless'.

Such attribute should be allowed not only on root (html) but on any
element which can act as a root of reasonable HTML fragment. Ideally
this should include all HTML elements, but having it at least on block
elements is a must for CMS that compose final documents from small pieces.

> However, I am slightly uncomfortable about this, because it is like
> giving the little finger to xsi:schemaLocation.

Once again, comparing version label like "5.0" with xsi:schemaLocation
doesn't make sense.

> With the schema project for (X)HTML5, fantasai and I have built in some
> options in the schema for dealing with HTML5 vs. XHTML5 differences and
> for catering to subsetting in ways that we foresee as reasonable.

Could you please put pointer to this schema here? I would like to see
how extensibility is handled.

> Since subsetters are going to do their own thing anyway, naming the
> subsets should be user-defined and it would be pointless to try to come
> up with a closed list of de jure subset names.

But there could/should be some basic naming policy. Remember that people
need some guidance (at least majority of people ;-).

> Indeed. Online conformance checkers should probably default to the
> broadest feature set they support. For example, allowing embedded SVG
> and MathML by default. (The reason why mine doesn't, yet, is that I
> haven't had time to review the SVG and MathML stuff properly, yet.)

With NVDL you don't have to review and study schemas for new languages. ;-D
You just say that this namespace is handled by this separate schema, no
need to integrate schemas using their extensibility hooks.

> I think XSLT is an example of bad design with versioning. (Disclaimer: I
> am not an XSLT expert. I try to avoid XSLT when I can.)

Your fault ;-)

> If you feed an old transformation sheet to SAXON 8, it will just warn
> you that differences between old versions of XSLT and XSLT 2.0 are your
> problem and figuring out if the warning applies to your particular
> transformations sheet is your problem as well. If you are unsure, you
> should use SAXON 6.
> So the version attribute doesn't give you old behavior. Downgrading the
> implementation version does. OTOH, the versions are incompatible enough
> for the new version of the engine to issue a warning.

I don't think that we should discuss XSLT here. But Saxon is just too
rigorous here (just for the case), incompatibilities are only in very
rare and edge cases. I have not triggered any of them in several tens of
thousands lines of XSLT code that I have written over past years. So
versioning of XSLT is indeed very successful, you can even mix and match
1.0 and 2.0 code without any problems. But XSLT is programming language,
not markup language as HTML is.

>> This for example means that you can not embeded XHTML page into SOAP
>> message and identify version of XHTML used.
> Considering what I said above, versioning XHTML inside SOAP messages
> should not be necessary. Interchange with loosely affiliated or
> unaffiliated parties is similar to the browser use case. 

In practice many companies use web-services in pretty tightly coupled
setups with very strange requirements. You say that specifying HTML
version should not be necessary. I say I have seen real requirements for
using controlled subset of XHTML inside payload.

>> Moreover request for download of private
>> copy of DTD could be misused as attack against Web agent—this DTD could
>> be very long or it could use a big amount of entity declarations to
>> congest XML parser.
> I hope that whatever this WG does, it doesn't pretend DTDs to work on
> the Web.

Yes, DTDs are dead and they never should be part of XML core
specification. But it is too late to change history. And DTDs can work
very well on Web when you use XML catalogs (but I'm not saying that HTML
should use them).

>> Example 6. More robust way of labeling document as XHTML Print
> FWIW, I think XHTML Print has remarkably little relevance to Web content
> or even authoring in editors.

Why do you think so? HTML is simply ubiquitous and it would be very
short-sighted to assume that HTML is used only for rendering content in
Web browsers.

In my personal opinion if you want to ensure that HTML will not fork,
you have to provide complete and flexible enough language that will
allow subsetting for more restricted environments like mail, low-cost
printers, Joe's 10 tag super-simple HTML, ...

Without providing such facility respective subsets will fork and this
will lead to great confuse for developers, and to incompatibilities for
content producers.

  Jirka Kosek      e-mail:
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
 OASIS DocBook TC member, W3C Invited Expert, ISO/JTC1/SC34 member
 Want to speak at XML Prague 2007 =>

Received on Saturday, 31 March 2007 19:44:43 UTC