- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Sat, 05 May 2007 14:03:09 +1000
- To: Terje Bless <link@pobox.com>
- CC: Chris Wilson <Chris.Wilson@microsoft.com>, Dan Connolly <connolly@w3.org>, W3C HTML WG <public-html@w3.org>
Terje Bless wrote: > 3) The “HTML5” submission appears to be actively incompatible with > previous versions of HTML (W3C and ISO specifications). While the > Charter admonishes that the WG should not «…assume that an SGML > parser is used…», neither does it (nor, indeed, could it) say that > it should be incompatible with an SGML parser. Regardless of what the > general desktop browser vendors have implemented, currently specified > variants of HTML are based on SGML (defined largely i terms of it) > and SGML parsers do have a need to consume web content (the content > predating the Recommendation of the “HTML5” submission, if nothing > else). In practice, the only user agents that use SGML parsers for processing HTML on the web are validators, and only a few authors who choose to use other SGML processors in their authoring tool chains. There are significantly more user agents and tools that do not make use of SGML processing, and therefore it does not make sense to try and optimise the specification for the few who do. The spec defines HTML in terms of the DOM and additionally defines two serialisations, HTML and XHTML, somewhat independently. Although there are some processing requirements that depend on which serialisation was used and some limitations in what can be faithfully represented in each; in the general case, either serialisation can be used to represent the same document. The spec does not define an SGML serialisation itself, but it also does not prevent one from being defined and implemented. Because the HTML serialisation is distinct from both the XML serialisation and a hypothetical SGML serialisation, there is no reason to maintain full syntactic compatibility between them. Indeed, there are many cases where such compatibility is not possible due to the processing requirements of each. If there were enough interest in having an SGML serialisation of HTML5 available, I would have no objection to the interested parties defining one in a separate specification. I do, however, believe that the existing HTML and XHTML serialisations should remain in the specification because they are far more common in reality. If an SGML serialisation were to be defined, it would need to define how to construct an HTML DOM, including adding the elements to the DOM in HTML namespace, dealing with the interaction of scripts (e.g. document.write() and .innerHTML) and stylesheets (e.g. case sensitivity of selectors). It would also need to deal with the things like the processing requirements for the <noscript> element or, like in XHTML, forbid its use in conforming documents. (In the HTML serialisation, the way it is parsed is dependent upon whether script is enabled). It would also need to define its own DOCTYPE, such as <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 5.0//EN"> and, if desired, write a DTD. The SGML serialisation should not be required to use the same DOCTYPE as the HTML serialisation. Note that the XHTML serialisation doesn't require the same DOCTYPE. It doesn't even require a DOCTYPE, though authors are free to use one if they wish. There is already the beginnings of a DTD for HTML5 [1], although the project is currently abandoned due to lack of interest. > c) Some reasonable measure to ensure compatibility with extant consumers > of web content, specifically that SGML parsers can be used to process > content that by definition is SGML based. I'm assuming you are referring to a desire to continue to to process HTML <= 4.01 as SGML for the purpose of validation, specifically on validator.w3.org and similar tools. > That is, some measure must > be put in place to ensure that the result of accepting the “HTML5” > submission does not prevent an SGML parser from consuming existing > content (by, e.g., redefining the meaning of apparent SGML content > served under the text/html media type or making itself > indistinguishable from existing content). There is a note in the HTML5 spec which states [2]: | [...] documents without DOCTYPEs or with DOCTYPEs that do not conform | to the syntax allowed by this specification are considered to be out | of scope of this specification. Although the specification is defining the processing for content served as text/html, it leaves open the possibility (though, generally not advisable) that alternative processing may be used by UAs that explicitly choose to do so based on the DOCTYPE or, presumably, user option. Although that note is in there as a way to recognise, yet not explicitly deal with, the use of quirks mode, it seems reasonable to recognise that some consumers (primarily validators) may wish to process HTML <= 4.01, or SGML serialisations of HTML documents, as SGML. > One possible way to achieve this is to require “HTML5” documents to > conform with SGML rules up until the end of the prolog, and to identify > itself under SGML rules as a particular FPI, such that an SGML parser > may discover that the document is one it cannot handle (and possibly > hand it over to a “HTML5” parser). HTML 5 defines the DOCTYPE to be <!DOCTYPE html>. Although that is a syntactically correct SGML DOCTYPE, it differs enough from other HTML DOCTYPEs in order to make the switch. Indeed, this is the method currently employed by the validator to determine whether or not to use XML processing for XHTML documents served as text/html. Although I personally don't agree with the validator doing so silently, it is evidence that this method is feasible. Additionally, any authors wishing to have their documents explicitly processed as SGML are free to deliver their content using the SGML MIME types text/sgml or application/sgml [RFC 1874]. This is similar to the way authors need to request XML processing by using an XML MIME type. In this case, it doesn't matter that typical browsers don't recognise those types, as they don't possess SGML parsers anyway (DocZilla is one exception). Does this address your concerns sufficiently enough to remove this point from your formal objection? [1] http://syntax.whatwg.org/ [2] http://www.whatwg.org/specs/web-apps/current-work/#the-initial -- Lachlan Hunt http://lachy.id.au/
Received on Saturday, 5 May 2007 04:03:26 UTC