- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 20 Nov 2008 23:51:59 +0000
- To: Mark Baker <distobj@acm.org>
- CC: public-html@w3.org
Mark Baker wrote: > [...] The parser and much of the language is > defined in DOM terms. I haven't had a detailed enough look at the > parser to know if the DOM gets in the way though, or if it can simply > be used as an abstract model as the spec says ("Implementations that > do not support scripting do not have to actually create a DOM Document > object, but the DOM tree in such cases is still used as the model for > the rest of the specification."). As somebody pointed out, html5lib > doesn't have a DOM, so that's an argument that it's possible. But I'm > still wary of using an implemented model as an abstract one, lest > nuances of the various implementations result in differing > interpretations of the specification. The parsing algorithm says: "The tree construction stage is associated with a DOM Document object when a parser is created. The "output" of this stage consists of dynamically modifying or extending that document's DOM tree." and then defines terms like "create an element for a token" in terms of DOM concepts like the HTMLAnchorElement interface. Then it uses phrases like: "Append a Comment node to the current node with the 'data' attribute set to the data given in the comment token." and "Insert 'last node' into 'node', first removing it from its previous parent node if any." So it seem to me that the spec is already using a quite abstract view of the DOM. It uses the DOM interface names to identify the different types of node that can be generated, and to refer to the fields of each node (like 'data'), but otherwise it uses generic tree terminology. In particular it doesn't say anything like "Execute node.appendChild(lastNode)", which would be much more DOM-implementation-specific. People who have implemented the parsing algorithm have used a variety of non-DOM output structures (ElementTree and BeautifulSoup in html5lib, XOM and SAX in validator.nu, some purely functional tree structure in my OCaml implementation, etc) have never (as far as I'm aware) expressed concerns that the spec makes it unnecessarily difficult to use a non-DOM output format. (There are some necessary difficulties when the output format can't represent all HTML documents, e.g. if it requires XML-compatible element names or unbuffered streaming, but those issues will occur regardless of how the spec is written.) Does this increase or assuage your wariness at all? -- Philip Taylor pjt47@cam.ac.uk
Received on Thursday, 20 November 2008 23:53:01 UTC