- From: Miles Sabin <msabin@cromwellmedia.co.uk>
- Date: Fri, 5 Mar 1999 15:26:39 -0000
- To: "'DOM list'" <www-dom@w3.org>
- Cc: "'Ray Whitmer'" <ray@imall.com>
Ray Whitmer wrote, > Miles Sabin wrote: > > The increase is in complexity in the big-oh sense > > ... that is, fixups (and even notifications of > > invalidation) have serious performance consequences, > > either for next/prevNode(), for tree mutation > > operations, or both. For those of us wanting to use > > the DOM on the server- side this is a serious issue, > > and would probably rule out the use of iterators in > > the same way that it now rules out the use of > > NodeLists. Worse still, it could also rule out the > > use of current or future APIs that are specified in > > terms of NodeIterators. > > I currently use NodeLists in very high demand server > code. It all depends upon the efficiency of the DOM > implementation and the specific use cases. I'm afraid I don't really understand this ... Maybe we don't understand the word 'efficency' in quite the same way: there's an ambiguity here, between the efficency of an _algorithm_ (or a data structure which supports an algorithm), and the efficiency of an _implementation_ of an algorithm (or data structure). The former is usually characterized in big-oh terms; the latter in terms of the constant factors between run times (or memory usage or whatever) and some normalized, ideal run time. Anyhow, the upshot is that an implementation of an O(log n) algorithm (no matter how efficient) will always be slower than an implementation of an O(1) algorithm (no matter how _inefficiently_ implemented) for some suitably large value of n. I suspect that the drafters have underestimated the extent to which the iterator requirements would have an impact on performance, partly because I think it's possible that they believe that iterators could be adequately defined in terms of the NodeLists we already have. Those have been deemed (however grudgingly) acceptable, hence NodeIterators will be too, eventually. Unfortunately this isn't the case: the differences between the semantics of NodeLists and NodeIterators are such that the timestamp based mechanism for NodeList maintenance (which gets you O(log depth-of-node) for node insertions and deletions) isn't sufficent. An implementation of iterators in terms of NodeLists using timestamps would result in O(number-of-iterators-on- subtree * depth-of-subtree-from-root) for insert and delete. Given that, in Java at least, both NodeLists and NodeIterators can hang around long after they're finished with from the programmers point of view (due to the vagaries of GC, Java2 weak refs notwithstanding) this additional multiplier would be significant if iterators were heavily used on large trees. Here's an illustration of fixup which is a bit more realistic than the one given in the draft (which only discusses sequences of siblings!!!), A / \ B G / \ / \ C D H I / \ E F Let's create a tree iterator rooted on A. The sequence this iterator generates, in document order, is, ABCDEFGHI We advance the iterator to node E, ABCDEFGHI ^ Now we delete node B from the tree. I would assume that the intended behaviour would be that the sequence and iterator position become, AGHI ^ So what has to go on behind the scenes for this to happen? Well, when B is deleted all iterators positioned on B or on nodes in the subtree under B have to be relocated to the document-order predecessor of B. To do this we must be able to identify all the iterators active on the subtree under B. There are a number of ways of doing that, 1. We maintain a list of all iterators active on the the tree. When a node is inserted or deleted we scan that list searching for and fixing up any iterators active on the subtree. 2. For each node we maintain a list of all iterators that are currently active in its subtree. When a node is inserted or deleted we run through all the iterators on that list and fix them up. On this model iterator next/prevNode are very expensive. 3. For each node we maintain a list of all iterators that are pointing to it. When a node is inserted or deleted we visit all the nodes in the subtree and then run through all the iterators on the lists maintained at each of those nodes. I sat down to do an analysis of these options but (ahem) it's a bit hard ;-) My guesstimate is that none of these come out any better than iterators implemented in terms of NodeLists, and at least one is much worse. > I tend to use NodeLists mostly in documents which > don't mutate until after I stop using the NodeList. But what are you saying here? You're saying that you don't mix NodeLists with tree mutation! And quite right too! Do you really want NodeIterators to turn out the same way? Ray Whitmer wrote, > Miles Sabin wrote: > > The spec should be amended to allow for more > > lightweight implementations (currently it can only > > be read as requiring that any and every > > NodeIterator implementation support fixup), even if > > the default implementation behaves as currently > > described for hand-holding purposes (however > > spurious). > > Although I once advocated this, I now believe that > implementations which behave differently from the > stated spec should implement a proprietary interface, > not a w3c dom interface. > > It doesn't achieve much if you say at the end of the > day, "well, isn't it great that I used the DOM APIs > everywhere". But then you try plugging in another DOM > and the iterator behaviors are all different so that > all the users of iterators get unexpected behaviors. But this is *not* the point at issue. The point is whether the _draft_ should be _revised_ to allow more flexibility, so the DOM can be applicable across a broader range of applications than currently appears to be envisaged. I accept that there is cost: the draft would have to make slightly weaker guarantees on iterator behaviour. The question is, is that cost worth the gain? I maintain that it is. Where are new DOM implementations coming on stream at the moment? In interactive browsers and document processors, certainly, but also in XML parsers, XSL processors, and database interfaces used by heavily stressed server-side applications. These implementations have different requirements, both in terms of the sort of operations on a DOM tree that they are likely to perform (ie. lots of tree construction and modification) and in terms of the size of documents that they are likely to be dealing with. Performance issues which might have no noticeable impact in an interactive application will be greatly magnified in these environments. > It should be a proprietary interface to start with so > that at least it is clear up front that the two > iterators are not interchangable even if the method > signatures happen to be the same. At one time or another a number of contributors to this list have proposed forking a server-side DOM-style API. I, like many others took the view that this was ill- advised, and it saddens me to hear a contributor to the current draft expressing an opinion which can only encourage that. Cheers, Miles -- Miles Sabin Cromwell Media Internet Systems Architect 5/6 Glenthorne Mews +44 (0)181 410 2230 London, W6 0LJ msabin@cromwellmedia.co.uk England
Received on Friday, 5 March 1999 10:34:36 UTC