Re: Speculative tokenization and foreign content from Henri Sivonen on 2008-12-11 (public-html@w3.org from December 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 11 Dec 2008 10:53:00 -0800
To: Sam Ruby <rubys@us.ibm.com>
Cc: HTML WG <public-html@w3.org>
Message-Id: <629EF342-83D3-474A-92A8-BCE0EF6C5A39@iki.fi>

On Dec 11, 2008, at 02:39, Sam Ruby wrote:

> Speculative evaluation of instruction streams on a modern CPU given  
> the presence of conditional branch instructions doesn't mean  
> determining with certainty the correct path every time, it simply  
> means getting it right enough of the time to make a difference.
>
> Even if you can't reliably determine if you "would have bailed out",  
> you might be able to do better than the rather pessimistic approach  
> mentioned above.  Considerably better.  The design of HTML 5 is  
> focused on robustness, even in the face of errors, and even if those  
> errors are relatively infrequent.  A simple approximation: <svg> or  
> <math> starts foreign content, </svg> and </math> stops foreign  
> content may be right enough of the time to make a difference.  You  
> still would have to decide what to do with nesting, and how to  
> detect whether the prediction was incorrect (i.e., any time after  
> the tree builder bails even once, it must stop trusting the token  
> stream at the point it encounters a <style> tag).

It would indeed be possible to make a guess, but the amount of  
bookkeeping would go up considerably.

The speculative token collector could maintain a stack of "in foreign"  
flags. <math> and <svg> would push true onto the stack,  
<foreignObject> push false, </math>, </svg> and </foreignContent>  
would pop. Something as simple as this would guess right most of the  
time. The guessing itself is not a big deal.

Without foreign content, the only rewind points would be immediately  
after </script>. Setting up a rewind point would not be a performance- 
critical operation, since </script> tokens are still relatively rare.

However, if the token collector makes a simple guess about the state  
of the "in foreign" flag, the flag needs to be stored on each token so  
that at token playback time, the stored guess can be compared against  
actual tree builder state, so that as soon as there's a mismatch, the  
rest of speculative tokens can be thrown away. The problem now is that  
rewind points are no longer rare. Instead, every token is a potential  
rewind point, and the data needed for rewinding needs to be recorded  
on a per token basis.

It's hard to tell without trying at which point the bookkeeping wastes  
more cycles than it saves.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 11 December 2008 19:02:07 UTC