Re: Should MutationObservers be able to observe work done by the HTML parser? from Jonas Sicking on 2012-08-30 (public-webapps@w3.org from July to September 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Thu, 30 Aug 2012 09:18:48 -0300
To: Rafael Weinstein <rafaelw@chromium.org>
Cc: olli@pettay.fi, Ian Hickson <ian@hixie.ch>, Adam Klein <adamk@chromium.org>, Mihai Parparita <mihaip@chromium.org>, Ryosuke Niwa <rniwa@webkit.org>, WebApps WG <public-webapps@w3.org>, Anne van Kesteren <annevk@annevk.nl>
Message-ID: <CA+c2ei8DKhkVk-xzgj3oa2JzeUqspGkfG4CwSricoOZu6eWM=g@mail.gmail.com>
On Thu, Aug 30, 2012 at 7:35 AM, Rafael Weinstein <rafaelw@chromium.org> wrote:
> On Wed, Aug 29, 2012 at 9:47 PM, Jonas Sicking <jonas@sicking.cc> wrote:
>> On Thu, Aug 30, 2012 at 12:57 AM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:
>>> On 08/30/2012 02:05 AM, Ian Hickson wrote:
>>>>>
>>>>> On Fri, Jun 15, 2012 at 4:35 PM, Adam Klein <adamk@google.com> wrote:
>>>>>>
>>>>>>
>>>>>> This code alerts in Firefox but not in Chrome:
>>>>>>
>>>>>> <!DOCTYPE html>
>>>>>> <body>
>>>>>>    <script>
>>>>>>      var observer = new MutationObserver(function(r) {
>>>>>>        alert(r);
>>>>>>      });
>>>>>>      observer.observe(document.body, {childList: true, subtree: true});
>>>>>>    </script>
>>>>>>    <p>Hello, World</p>
>>>>>> </body>
>>>>>>
>>>>>> In WebKit's implementation, we had assumed that MutationObservers were
>>>>>> meant to observe changes after page load (and I personally thought
>>>>>> that we'd specced it that way, by putting it in DOM4, not HTML). But
>>>>>> it seems the Mozilla implementors made a different assumption. But
>>>>>> what should happen?
>>>>>>
>>>>>> IMHO, it may not be worth the gain may not be worth the possible
>>>>>> performance degradation. If script wants to find out what the parser
>>>>>> put on the page, it should wait for DOMContentLoaded. But I can
>>>>>> imagine a use case where script might want to find out about the
>>>>>> parser's work during load.
>>>>>>
>>>>>> In any case, we should try to come to a decision about this, since
>>>>>> this seems to be the one major divergence between the existent
>>>>>> implementations of MutationObservers.
>>>>
>>>>
>>>> The spec used to say "DOM mutation events must not fire for changes caused
>>>> by the UA parsing the document". I've updated this to also mention
>>>> mutation observers.
>>>
>>>
>>> Why? Getting MutationObserver notifications during parsing is what I'd
>>> expect
>>> the API to provide (and that is implemented in Gecko).
>>>
>>>
>>>>
>>>>
>>>> On Fri, 15 Jun 2012, Ryosuke Niwa wrote:
>>>>>
>>>>> On Fri, Jun 15, 2012 at 6:15 PM, Mihai Parparita wrote:
>>>>>>
>>>>>>
>>>>>> I used MutationObservers for the first time last night (in Chrome),
>>>>>> and I was surprised by this behavior. I was working on something that
>>>>>> transmogrified one node into another, so having observers file during
>>>>>> parsing would have helpful. Otherwise something like:
>>>>>>
>>>>>> <script>var observer = new MutationObserver(/* observer that
>>>>>> manipulates <foo> tags */);</script>
>>>>>> <body>
>>>>>> ....
>>>>>> <foo></foo>
>>>>>> <script>
>>>>>>   /* code that acts on foo tags */
>>>>>> </script>
>>>>>>
>>>>>> If I have to use DOMContentLoaded, then the code in the second
>>>>>> <script> block would get a chance to operate on <foo> before my
>>>>>> observer had had a chance to transmogrify it.
>>>>>
>>>>>
>>>>> That is a slightly different issue.
>>>>>
>>>>> There is no guarantee that your observer is called before the second
>>>>> script element's script is executed. The only way for that to happen is
>>>>> if the parser yielded to the event loop immediately after parsing the
>>>>> foo element but before executing the script element.
>>>>
>>>>
>>>> The spec actually does require that the UA "provide a stable state" before
>>>> processing <script>s, which invokes the relevant part of the event loop.
>>>> If mutation observers were to fire during parse, it would require those to
>>>> fire too (it currently does not).
>>>>
>>>>
>>>> On Tue, 26 Jun 2012, Adam Klein wrote:
>>>>>
>>>>>
>>>>> I take it from your reply that you and I had the same view of what's
>>>>> specced in DOM4. That is, that MutationObservers are not specified to be
>>>>> notified of actions taken by the parser. Given that fact, it seems that
>>>>> either the spec should be changed (and by "spec" here I think the
>>>>> required changes are in HTML, not DOM), or Firefox's implementation
>>>>> ought to be changed.
>>>>>
>>>>> Anne, Ian, Olli, Jonas, your thoughts?
>>>>
>>>>
>>>> I've updated HTML to explicitly say mutation observers don't fire when
>>>> parsing, in the same way that it says not to fire mutation events.
>>>>
>>>
>>> The point is that MutationObserver can handle parsing case just fine.
>>> It doesn't have similar problems what Mutation Events have.
>>
>> Indeed. I have the same questions. The API doesn't have same
>> performance and re-entrancy problems that caused us to disable
>> mutation events.
>>
>> If we don't fire mutation observers during parsing it means that you
>> basically can't rely on mutation observers at all until after the
>> DOMContentLoaded event has fired. In fact, you can't rely on them even
>> past then since if another page is loaded in an <iframe> and nodes are
>> moved from that iframe to your document, you won't be notified about
>> changes done by the parser to those nodes.
>>
>> So this change removes the ability to guarantee that you can track
>> mutations done to a document, which was one of the big advantages that
>> MutationObservers had over mutation events. I.e. mutation events only
>> let you track mutations to your document as long as those mutations
>> "behaved well", i.e. weren't done from inside mutation observers. With
>> this we add a similar requirement to mutation observers that they are
>> only reliable as long as pages "behave well" and don't move nodes from
>> a document which is being parsed.
>>
>> Just like mutation events solved the use cases of many authors,
>> mutation observers will solve the use cases of many authors even with
>> this change. However it will miss edge cases which I think is really
>> unfortunate.
>>
>> / Jonas
>
> When does Gecko invoke observers during page construction? Are any of
> the goals we set for the processing model still retained during load,
> i.e.
>
> -"I get to complete my work before paint"
> -"Future script invocations don't see the world in an inconsistent
> state because observations have yet to be delivered"
>
> For example, if I wanted to implement an observer which polyfiled web
> components and did fix-up on elements with known attribute values as
> they are loaded and rendered, am I guaranteed that the user will never
> see undecorated elements during load? ... that any <script> that runs
> can assume "components" they retrieve via querySelector have been
> "blessed"?

Yes, all of the goals and invariants are retained during parsing. The
way it works is that the parser occasionally receives data from the
network (actually, in our case from the parser thread). When that
happens it creates and inserts into the DOM a bunch of elements. In
some cases elements are also moved around. Once we have consumed all
available data, or we feel that we have been off the event loop for
too long, we return to the event loop. Once that happens we fire
MutationObservers as normal.

So as far as timing for notifications go, we really don't do anything
special during parsing, everything acts just as if it had been a
snippet of JS which had done a handful of mutations and then returned
to the event loop.

The only thing that we do special is that when we insert a DOM node,
we do two things differently:
1. We make sure to set all element attributes before inserting the
element into the DOM. This to avoid sending notifications for
attribute modifications. The element is inserted with the right set of
attributes from the get-go.
2. We coalesce notifications about node insertions. So if we first
insert an element and then insert 5 nodes inside that element, all
without returning to the event loop, we pretend that the 5 child-nodes
had been inserted into the element before the element was inserted
into the document. So to the page it looks like the parser inserts
fragments of nodes rather than individual nodes. Likewise, if we
append a set of sibling nodes into an already-inserted element, we
coalesce this so that it looks like a DocumentFragment with these
siblings were inserted in a single operation.

The way we implement this coalescing is that when we do the actual
insertion we have a flag which indicates that normal mutationobserver
notifications should not be sent (I assume that you have something
similar to what you guys are currently doing to suppress parser
notifications). Once we reach an end-tag for an element which we
haven't notified for, or return to the event loop, we add the
appropriate mutation records needed for the outer-most elements that
we have just inserted. It only really requires tracking a couple of
pointers and a depth integer until we return to the event loop.

/ Jonas

/ Jonas
Received on Thursday, 30 August 2012 12:19:49 UTC