- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 18 Nov 2008 23:33:29 +0200
- To: elharo@metalab.unc.edu
- Cc: Boris Zbarsky <bzbarsky@MIT.EDU>, Ian Hickson <ian@hixie.ch>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, public-html <public-html@w3.org>, www-tag@w3.org
On Nov 18, 2008, at 15:24, Elliotte Harold wrote: > Henri Sivonen wrote: > >> This means that agents that do not support scripting may use a >> different object model. For example, it's conforming to implement a >> no-scripting agent with XOM as the internal object model. The >> Validator.nu HTML Parser even supports XOM out-of-the-box. > > As you point out XOM instead of DOM is not a big leap. They're both > tree model after all. I'm more concerned about more radical changes > like SAX or other streaming APIs or document specific data bound > models or even stranger things. Is it plausible to extend the HTML 5 > parsing model to cover this? Yes, and I've got proof by implementation. :-) The Validator.nu HTML Parser supports SAX in two different modes: streaming and tree-buffered. In the streaming mode, the parser emits SAX events as it proceeds in the input stream. However, there are some types of authoring errors for which the error recovery is not streamable. These errors are treated like XML well-formedness errors. I'd like to emphasize that this behavior is conforming per spec: http://www.whatwg.org/specs/web-apps/current-work/#parse-error In the tree-bufferend mode, the parser builds a tree using a purpose- optimized tree model (which is neither DOM nor XOM and outperforms Xerces2 DOM and XOM for this use case) and after the input stream has been exhausted, fires SAX events corresponding to the tree. It is unfortunate that there are classes of errors for which spec- compliant recovery is non-streamable. The legacy restricts us here. :- ( Note that implementing streamable ad hoc error recovery for these cases is *not* conforming per spec. > I also strongly question the wisdom of locking in one of the > absolute worst APIs we have. If there's one thing that needs > replacing in the HTML ecosystem, it's DOM. Sooner or later DOM will > be replaced, and if HTML 5 is standing in the way when that day > comes, then HTML 5 is going to come up the loser. Were the object > model separable from the syntax and semantics, then the sensible > parts of HTML 5 would have a better chance of surviving the > transition. It's extremely unlikely that the DOM would go away in browsers. It's semi-plausible that a better API will be introduced for the same data model (E4X has been failing so far...), but it isn't feasible to remove the DOM API, since there's so much existing content depending on it. As for the DOM going away in non-browser agents that don't run scripts, the SAX and XOM modes of the Validator.nu HTML Parser and the ElementTree (etc.) APIs for html5lib-created trees are proof that it works quite well already with the kind of spec we have. (html5.validator.nu doesn't operate on a DOM or in fact on any kind of in-memory tree model, BTW.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 18 November 2008 21:34:15 UTC