- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 20 Nov 2008 23:51:59 +0000
- To: Mark Baker <distobj@acm.org>
- CC: public-html@w3.org
Mark Baker wrote:
> [...] The parser and much of the language is
> defined in DOM terms. I haven't had a detailed enough look at the
> parser to know if the DOM gets in the way though, or if it can simply
> be used as an abstract model as the spec says ("Implementations that
> do not support scripting do not have to actually create a DOM Document
> object, but the DOM tree in such cases is still used as the model for
> the rest of the specification."). As somebody pointed out, html5lib
> doesn't have a DOM, so that's an argument that it's possible. But I'm
> still wary of using an implemented model as an abstract one, lest
> nuances of the various implementations result in differing
> interpretations of the specification.
The parsing algorithm says:
"The tree construction stage is associated with a DOM Document object
when a parser is created. The "output" of this stage consists of
dynamically modifying or extending that document's DOM tree."
and then defines terms like "create an element for a token" in terms of
DOM concepts like the HTMLAnchorElement interface. Then it uses phrases
like:
"Append a Comment node to the current node with the 'data' attribute
set to the data given in the comment token."
and
"Insert 'last node' into 'node', first removing it from its previous
parent node if any."
So it seem to me that the spec is already using a quite abstract view of
the DOM. It uses the DOM interface names to identify the different types
of node that can be generated, and to refer to the fields of each node
(like 'data'), but otherwise it uses generic tree terminology. In
particular it doesn't say anything like "Execute
node.appendChild(lastNode)", which would be much more
DOM-implementation-specific.
People who have implemented the parsing algorithm have used a variety of
non-DOM output structures (ElementTree and BeautifulSoup in html5lib,
XOM and SAX in validator.nu, some purely functional tree structure in my
OCaml implementation, etc) have never (as far as I'm aware) expressed
concerns that the spec makes it unnecessarily difficult to use a non-DOM
output format. (There are some necessary difficulties when the output
format can't represent all HTML documents, e.g. if it requires
XML-compatible element names or unbuffered streaming, but those issues
will occur regardless of how the spec is written.)
Does this increase or assuage your wariness at all?
--
Philip Taylor
pjt47@cam.ac.uk
Received on Thursday, 20 November 2008 23:53:01 UTC