Re: Conditional branch in tree builder based on DOM state from Maciej Stachowiak on 2009-01-03 (public-html@w3.org from January 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sat, 03 Jan 2009 09:13:43 -0800
To: Jonas Sicking <jonas@sicking.cc>
Cc: Ian Hickson <ian@hixie.ch>, Henri Sivonen <hsivonen@iki.fi>, HTMLWG WG <public-html@w3.org>
Message-id: <84A05DC2-1C9F-4430-8426-338BBC657D2D@apple.com>

On Jan 2, 2009, at 6:31 PM, Jonas Sicking wrote:

> On Fri, Jan 2, 2009 at 6:21 PM, Maciej Stachowiak <mjs@apple.com>  
> wrote:
>>
>> On Jan 2, 2009, at 5:09 AM, Jonas Sicking wrote:
>>
>>>
>>> I agree that this would complicate a parser implementation that
>>> doesn't do speculative parsing while blocked for <script>s. However
>>> the extra complication is very small compared to the complication
>>> added to speculative parsers. Additionally, it does simplify the
>>> interface between the parser and the DOM since with this and the  
>>> other
>>> change that has been proposed there is no need to read data back  
>>> from
>>> the DOM. This interface simplification is likely about the same as
>>> adding an extra flag to each node in the tag stack.
>>>
>>> Also, any competitive browser is going to have to do speculative
>>> parsing. The performance gains from doing so is so substantial that
>>> not doing it is not really an option.
>>
>> WebKit trunk does speculative parsing but we don't require anything  
>> like
>> this (since we don't care about reusing the tokens - tokenizing is  
>> cheap -
>> and we haven't cared so far about multithreaded parsing). In fact  
>> it seems
>> to me it would make things more complicated if we were required to
>> distinguish between parser-added and script-added children at parse  
>> time.
>
> Sorry, I got mixed up regarding why we made this request.
>
> Do note though that you don't need to distinguish between parser-added
> children and script-added children in the node though. All you need to
> do is add information to the stack of open elements regarding if the
> parser has added children to the node or not. It's likely that the
> stack already contains information other than the node itself, such as
> if the node is a formatting node or not, so adding an extra bit should
> be no work.

I'll tentatively agree that iit doesn't sound like a big complication  
(we do indeed have info besides the node in the stack of open  
elements). I would have to ask one of our parsing experts to be sure.  
One thing I wonder about: would this require some new behavior to be  
implemented for the case where a node has children, but not parser- 
added children?

> There is a small amount of work required to set this extra bit to true
> though any time the parser adds a child to the node. However this
> amount of work hardly seems enough to sacrifice the ability to do
> off-main-thread parsing.

It does sound like a good goal to support doing significant pieces of  
work on a separate thread, if the changes required are minor. In this  
case though, it sounds like off-the-main-thread parsing can at least  
in theory be done without any changes to the algorithm, though in a  
slightly roundabout way. One thing I am uncertain of here is the  
actual benefit of parsing on a separate thread. Do you have any  
performance results from your prototoype work so far?

Regards,
Maciej

Received on Saturday, 3 January 2009 17:14:25 UTC