Re: Formal Objection in Questions 1 and 3 on the Ballot from Lachlan Hunt on 2007-05-05 (public-html@w3.org from May 2007)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Sat, 05 May 2007 14:03:09 +1000
To: Terje Bless <link@pobox.com>
CC: Chris Wilson <Chris.Wilson@microsoft.com>, Dan Connolly <connolly@w3.org>, W3C HTML WG <public-html@w3.org>
Message-ID: <463C01FD.5080007@lachy.id.au>
Terje Bless wrote:
> 3) The “HTML5” submission appears to be actively incompatible with
>    previous versions of HTML (W3C and ISO specifications). While the
>    Charter admonishes that the WG should not «…assume that an SGML
>    parser is used…», neither does it (nor, indeed, could it) say that
>    it should be incompatible with an SGML parser. Regardless of what the
>    general desktop browser vendors have implemented, currently specified
>    variants of HTML are based on SGML (defined largely i terms of it)
>    and SGML parsers do have a need to consume web content (the content
>    predating the Recommendation of the “HTML5” submission, if nothing
>    else).

In practice, the only user agents that use SGML parsers for processing 
HTML on the web are validators, and only a few authors who choose to use 
other SGML processors in their authoring tool chains.

There are significantly more user agents and tools that do not make use 
of SGML processing, and therefore it does not make sense to try and 
optimise the specification for the few who do.

The spec defines HTML in terms of the DOM and additionally defines two 
serialisations, HTML and XHTML, somewhat independently.  Although there 
are some processing requirements that depend on which serialisation was 
used and some limitations in what can be faithfully represented in each; 
in the general case, either serialisation can be used to represent the 
same document.

The spec does not define an SGML serialisation itself, but it also does 
not prevent one from being defined and implemented.

Because the HTML serialisation is distinct from both the XML 
serialisation and a hypothetical SGML serialisation, there is no reason 
to maintain full syntactic compatibility between them.  Indeed, there 
are many cases where such compatibility is not possible due to the 
processing requirements of each.

If there were enough interest in having an SGML serialisation of HTML5 
available, I would have no objection to the interested parties defining 
one in a separate specification.  I do, however, believe that the 
existing HTML and XHTML serialisations should remain in the 
specification because they are far more common in reality.

If an SGML serialisation were to be defined, it would need to define how 
to construct an HTML DOM, including adding the elements to the DOM in 
HTML namespace, dealing with the interaction of scripts (e.g. 
document.write() and .innerHTML) and stylesheets (e.g. case sensitivity 
of selectors).

It would also need to deal with the things like the processing 
requirements for the <noscript> element or, like in XHTML, forbid its 
use in conforming documents.  (In the HTML serialisation, the way it is 
parsed is dependent upon whether script is enabled).

It would also need to define its own DOCTYPE, such as

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 5.0//EN">

and, if desired, write a DTD.  The SGML serialisation should not be 
required to use the same DOCTYPE as the HTML serialisation.  Note that 
the XHTML serialisation doesn't require the same DOCTYPE.  It doesn't 
even require a DOCTYPE, though authors are free to use one if they wish.

There is already the beginnings of a DTD for HTML5 [1], although the 
project is currently abandoned due to lack of interest.

>  c) Some reasonable measure to ensure compatibility with extant consumers
>     of web content, specifically that SGML parsers can be used to process
>     content that by definition is SGML based.

I'm assuming you are referring to a desire to continue to to process 
HTML <= 4.01 as SGML for the purpose of validation, specifically on 
validator.w3.org and similar tools.

>     That is, some measure must
>     be put in place to ensure that the result of accepting the “HTML5”
>     submission does not prevent an SGML parser from consuming existing
>     content (by, e.g., redefining the meaning of apparent SGML content
>     served under the text/html media type or making itself
>     indistinguishable from existing content).

There is a note in the HTML5 spec which states [2]:

| [...] documents without DOCTYPEs or with DOCTYPEs that do not conform
| to the syntax allowed by this specification are considered to be out
| of scope of this specification.

Although the specification is defining the processing for content served 
as text/html, it leaves open the possibility (though, generally not 
advisable) that alternative processing may be used by UAs that 
explicitly choose to do so based on the DOCTYPE or, presumably, user 
option.  Although that note is in there as a way to recognise, yet not 
explicitly deal with, the use of quirks mode, it seems reasonable to 
recognise that some consumers (primarily validators) may wish to process 
HTML <= 4.01, or SGML serialisations of HTML documents, as SGML.

>     One possible way to achieve this is to require “HTML5” documents to
>     conform with SGML rules up until the end of the prolog, and to identify
>     itself under SGML rules as a particular FPI, such that an SGML parser
>     may discover that the document is one it cannot handle (and possibly
>     hand it over to a “HTML5” parser).

HTML 5 defines the DOCTYPE to be <!DOCTYPE html>.  Although that is a 
syntactically correct SGML DOCTYPE, it differs enough from other HTML 
DOCTYPEs in order to make the switch.  Indeed, this is the method 
currently employed by the validator to determine whether or not to use 
XML processing for XHTML documents served as text/html.  Although I 
personally don't agree with the validator doing so silently, it is 
evidence that this method is feasible.

Additionally, any authors wishing to have their documents explicitly 
processed as SGML are free to deliver their content using the SGML MIME 
types text/sgml or application/sgml [RFC 1874].  This is similar to the 
way authors need to request XML processing by using an XML MIME type. 
In this case, it doesn't matter that typical browsers don't recognise 
those types, as they don't possess SGML parsers anyway (DocZilla is one 
exception).

Does this address your concerns sufficiently enough to remove this point 
from your formal objection?

[1] http://syntax.whatwg.org/
[2] http://www.whatwg.org/specs/web-apps/current-work/#the-initial

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Saturday, 5 May 2007 04:03:26 UTC