DOM Mutation Events Replacement: Findings from implementing "net-effect" projections from Rafael Weinstein on 2011-08-17 (public-webapps@w3.org from July to September 2011)

From: Rafael Weinstein <rafaelw@google.com>
Date: Tue, 16 Aug 2011 18:54:23 -0700
To: Webapps WG <public-webapps@w3.org>
Message-ID: <CABMdHiRnohhjw-MK67t49ppgtWKQyNKV7OUnX4bu62MkDmhCzg@mail.gmail.com>
TL;DR;

1) ObserveSubtree semantics doesn't provide a robust mechanism for
observing a tree/fragment, and if we don't provide something more
complete, libraries will likely register observers at every node in
the document.

2) Not providing position information (e.g childlistIndex) in
"ChildlistChanged" mutations means that the algorithmic complexity of
computing whether/where nodes have moved to doesn't improve vs. having
to bruce-force compute.

3) Not providing "lastValue" for text changes (attribute, textNode,
etc...) may adversely impact some use cases.

-----------
Following up on points 6 & 7 from
http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/0779.html
...

I've prototyped the main "net-effect" projections that I believe
various use cases will need. I've published to code to this public
github repository:

https://github.com/rafaelw/MutationProjections

I encourage anyone interested to check my work and make corrections
and/or suggestions, or create a branch that experiments in any
direction they see fit.

-----------
Per point (3) in the email above, we've concluded that mutations
should be delivered in batches, since any number of mutations can have
taken place since an observer was notified.

Also, though it varies, there's generally a hierarchy of expense for
types of operations done from JS:

  reads/writes on "pure" JS data < DOM Reads < DOM Writes < External I/O

Often, an application is willing to do a some amount of work of a less
expensive operation to avoid doing any work of a more expensive
operation.

Thus, a mutation observer is highly motivated, for both correctness
and performance, to answer the question:

  What is the net-effect of some set of DOM changes that have taken
place since I last got to run?

-----------
Depending on the use case, "what happened" likely means some
combination of the following:

1) Reachability: What nodes were added to or removed from the document?

2) Matching: What nodes stopped or started "matching" a given pattern?
This could be arbitrarily complex, but in the use cases discussed,
this usually means a simple selector.

3) Parent changes: What nodes are now children of a new parent node?

4) Movement: What nodes moved to a new position in the document?

5) Value change: What character/text or attribute nodes changed value
(for attributes, also, whether the attribute was added or removed)?

Note that none of the above requires any mechanism of mutation
observation. All can be computed given the opportunity to inspect the
DOM at two points in time.

However, without any information about what happened, the expense of
answering each of the above questions is roughly proportional to the
number of nodes in the document. All are roughly linear, except (4)
Movement, which is roughly quadratic.

It seems to me that a good goal for any mutation event replacement
should be that the expense of answering the above projections be
proportional the number of mutations which actually took place.

I've implemented the "null case" (lacking any mutation information)
for 1, 3 & 4 here:

https://github.com/rafaelw/MutationProjections/blob/master/projections.js#L39

And the same projections, but taking mutation records to improve
performance here:

https://github.com/rafaelw/MutationProjections/blob/master/projections.js#L240

-----------
Observing sets of nodes

In order to implement the above projections, I prototyped:

  Document.observeOwnedNodes

I'm not proposing this (yet), but I think it is conceptually the most
general case (btw, to give credit to Jonas, he pointed out that this
might be solution to the problem I'm about to describe, though I'm
pretty sure he meant "sometime down the road").

The problem is that observeSubtree doesn't report anything about what
happened to a node that was removed from the document, modified, then
returned to the document. Consider the following case:

-Register a subtreeObserver at the root of the document
-Some "other" code removes an element, appends a new element to it,
modifies an attribute, then returns the node to the document.

Now, in order to properly compute the above projections, we are nearly
back to the null case (matching becomes proportional the number of
descendants of added/removed nodes that we heard about, but everything
else returns to being proportional to the number of nodes in the
document).

You may consider this to be an esoteric case, but imagine the
situation that library authors will be in. They will have two basic
options:

a) Use observeSubtree and explain to their users that they have to be
careful never to modify something outside the tree.
b) Simply register an observer at all nodes in the document.

Picking the second option means that my library works properly and I
don't need my users to be careful. It burns plenty of memory, but
library authors aren't well known for being conservative in that
regard -- especially since its pretty hard for them to quantify the
memory impact of anything.

The danger is that if we don't offer a better mechanism, we'll create
a common situation that every observer in use has been registered at
every node in the document.

-----------
Complexity impact of providing/witholding various data:

Assuming that projections can observe *at least* the set of nodes in
the document, even if they are temporarily removed, we can focus on
the complexity impact of including/excluding various bits of data in
mutation notifications.

The good news is that all recent proposals *mostly* meet the goal for
the above projections of being roughly proportional to the number of
mutations. There's one big exception and then a few things to note:

-----------
Movement:

Jonas' most recent proposal pre-summarizes what has happened to a
given element (keeping a running set of added & removed nodes). Doing
so discards position information about where nodes were added or
removed. Having position information makes the complexity of computing
movement roughly quadratic WRT the number of mutations that occurred:

https://github.com/rafaelw/MutationProjections/blob/master/projections.js#L487

Lacking it means that it continues to be roughly quadratic WRT the
number of nodes in the document:

https://github.com/rafaelw/MutationProjections/blob/master/projections.js#L163

-----------
Value Change & lastValue

Excluding lastValue impacts (2) Matching in that wasMatching() cannot
be directly computed. The projection must store the set of nodes which
have already been matched. The danger here probably isn't the amount
of memory this burns as it is that it creates another mechanism by
which pages can easily prevent garbage collection of (otherwise)
unreferenced nodes.

Excluding lastValue impacts (5) Value change in that the old value
must have been saved in order to prevent reporting that the value
changed when it did not. This is probably only bad for use cases that
care about all or most text nodes in the document (e.g. tree
mirroring, possibly editing).

-----------
Reachability & wasInDocument.

In all current proposals, computing wasReachable and isReachable for a
given node is O(log n) in the number of nodes in the tree. We could
make this constant-time by simply including a flag with
ChildistChanged that indicates whether the target node wasInDocument
at the time of the mutation.

Doing so, however, probably only makes sense if we also expose a
constant-time method of asking any node whether it isInDocument(), as
both isReachable and wasReachable() are needed to compute which nodes
were added/removed from the document:

https://github.com/rafaelw/MutationProjections/blob/master/projections.js#L315
Received on Wednesday, 17 August 2011 01:54:58 UTC