Re: Input on the agenda from Henri Sivonen on 2009-03-10 (public-html@w3.org from March 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 10 Mar 2009 11:14:21 +0200
To: Jonas Sicking <jonas@sicking.cc>
Cc: Maciej Stachowiak <mjs@apple.com>, Doug Schepers <schepers@w3.org>, Ian Hickson <ian@hixie.ch>, public-html@w3.org, www-svg <www-svg@w3.org>
Message-Id: <010968E3-4774-442D-8F66-C113B3C1224F@iki.fi>

On Mar 10, 2009, at 02:00, Jonas Sicking wrote:

> I don't like at all that we have to use a different tokenizer in  
> "HTML mode" and in "foreign content mode". This is both confusing to  
> web developers and painful for end users (as performance and code  
> complexity suffers).

So far, it seems that the difference (taking certain state transitions  
in the tokenizer differently--I wouldn't call it a different  
tokenizer) mainly affects the ability to do cheap speculative  
tokenization past <svg> or <math>. I think the different state  
transitions aren't a perf issue for normal parsing.

Now that the tree builder outputs a stream of operations and doesn't  
read back from the DOM, it would be feasible to raise the stakes upon  
speculation failure and run even the tree builder speculatively. So  
one could run speculative tokenization until <svg> or <math>. During  
speculative tokenization the failure condition would be getting a  
document.write that doesn't finish on a token boundary. When <svg> or  
<math> is seen, one could switch to speculative tree building. At that  
point, the failure condition would be getting any document.write at  
all. Thus, perf would suffer on pages that do document.write before  
<svg> or <math> (e.g. pages that have document.written ads above  
content). However, pages that do sane document.write at the bottom of  
the page for ad or tracking purposes would be fine.

And yeah, this would mean code complexity.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 10 March 2009 09:15:35 UTC