Re-entrant invocation of the tree builder from Henri Sivonen on 2008-06-17 (public-html@w3.org from June 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 17 Jun 2008 22:40:04 +0300
To: "public-html@w3.org WG" <public-html@w3.org>
Message-Id: <9304699F-1604-4FA3-A60B-875EB3F9DD76@iki.fi>

I'm not sure if I understand the implications of this:
> There is only one set of state for the tokeniser stage and the tree  
> construction stage, but the tree construction stage is reentrant,  
> meaning that while the tree construction stage is handling one  
> token, the tokeniser might be resumed, causing further tokens to be  
> emitted and processed before the first token's processing is complete.

I'd expect a design like this for a parser that is suitable for  
running in a single thread with UI:

Driver: Responsible for pushing buffers of data to the tokenizer  
(possibly remains of a buffer that the tokenizer didn't consume fully).
Tokenizer: Responsible for pushing tokens to the tree builder. Can be  
asked to return control to the driver after any character leaving a  
buffer partially consumed and the remains pending to be repushed to  
the tokenizer.
Tree builder: Builds the tree. Can ask the tokenizer to return control  
to the driver.

In a design like this, spins through the event loop can happen upon  
return to the driver, and the event loop can 'breathe' both between  
buffers whose size can be regulated by the driver as well when the  
tree builder thinks it's appropriate to ask the tokenizer to return.

Now, when a script element is inserted, the tree builder could request  
a return to the driver leaving both the tokenizer and the tree builder  
in a coherent state and script execution could be kicked off from the  
driver outside either the tokenizer or the tree builder causing  
neither the tokenizer nor the tree builder to be re-entrant.

Would doing this break something crucial compared to actually re- 
entering the tree builder before it has returned from handling the  
script-related tokens?

Aside: I find the concept of "insertion point" in a stream to be  
harder to track than a concept of a stack of pending streams where  
each document.write() pushes a new stream onto the pending stack.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 17 June 2008 19:40:53 UTC