- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 26 Jan 2006 23:17:06 +0000 (UTC)
On Wed, 25 Jan 2006, Henri Sivonen wrote: > > Anyway, here's what I thought they were doing: > > There's low-level parser [that] is kind of like a tag-level lexer and > emits a (non-well-formed) sequence of SAX-like events like startTag, > characters, endTag and comment (in my parser* HtmlParser.java). That's the Tokenisation Stage in the spec now. > These events don't go to the DOM builder / content sink directly. > Instead, there's a filter layer that takes care of tag inference and > emits a well-formed event stream (TagInferenceFilter.java and > EmptyElementFilter.java in my parser). Additionally, there's a filter > (not present in my parser, which is designed for conformance checking; > this may need to be integrated into the tag inference filter) that > performs the "residual style" fixups. That wouldn't work. You can't know whether something is well-formed or not til you get to the end of it. Consider these examples in light of what Mozilla and Safari do with them: <em> <strong> ...2GB... </em> </strong> Or: <em> ...2GB... <p> ...2GB... </em> </p> Incremental rendering means you have to be adding stuff to the DOM as you get it, you can't wait to be sure. (Mozilla does a "pre-parse" with what it has, sort of like what you are suggesting, but it only does it with what it has, which means that the DOM you get is dependent on packet boundaries and such. This results in non-deterministic parsing, which isn't really acceptable.) > Perhaps this model is a simple enough model to be deterministically > specified but still good enough an approximation of Gecko's and > WebCore's behavior. All decisions are local to the parse event being > observed and do not involve reshuffling the parts of the DOM that have > already been built. If it doesn't handle the examples in this thread like IE (in the rendering) then it isn't good enough. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 26 January 2006 15:17:06 UTC