RE: Older Sibling?

At 11:13 AM -0000 3/1/00, Miles Sabin wrote:
>David Brownell wrote,
>> would you seriously expect DOM to save you the work of
>> writing such a simple subroutine?  If so, why?
>How about: because some implementations might be able to
>provide an O(1) implementation, whereas an external routine
>would be likely to be O(number of siblings) or thereabouts.

Good point. A patent search will show you a few O(1) methods for SGML, and
there are other ways known since, some published in the last few years of
the ACM Hypertext and Digital Libraries conferences.

The performance difference could easily be even worse. Some implementations
such as many based on B-trees, are worse than O(number of siblings) because
the time to examine each sibling is worse than O(k) and the costs multiply.
Implementations *ought* to do no worse than you describe, but even
O(|siblings|) vs. O(1) is a *big* deal sometimes; I've seen lots of real
user documents with sibling-lists in the 10,000 range. XML data that's
exported from RDBs (like a lot of e-commerce interchange, for example) will
often have huge fanout. a 10,000 to 1 hit on performance seems like a
pretty good reason to consider adding an interface for something that so
clearly fits the data model anyway....

>There're several other queries like this which could usefully
>be added and which might allow for similar optimizations:
>precedes in document order; is an ancestor of; least common
>ancestor; depth from root etc.

Yup; there's a must-read paper in Digital Libraries 98 (+/- 1? sorry, don't
have the reference handy, authors were from Korea, though) on ways to
optimize the heck out of all these commonplace operations. Also some good
work in VLDB '99 (though the most relevant ones from there don't actually
mention XML in the titles).

Chief Scientist, Scholarly Technology Group, and
   Adjunct Associate Professor, Brown University
North American Editor, the Text Encoding Initiative

Received on Wednesday, 1 March 2000 12:33:20 UTC