- From: Michael Brundage <xquery@comcast.net>
- Date: Sun, 06 Jun 2004 16:08:18 -0700
- To: Michael Kay <mhk@mhk.me.uk>, 'Fang Yidong' <fangyidong@yahoo.com.cn>, XQuery Public Comments <public-qt-comments@w3.org>
- Cc: <massimo@w3.org>
Although it's true that last() has significant performance implications, consider even simple XPaths like /a/b[/a/c] or /a[b = /a/c]. Because you can express cross-products, XPath queries can easily be O(N^2) or worse unless you use indexing techniques (as Michael Kay replied) >> '/e1[predicate1]/e2[predicate2]/e3[predicate3]/e4[fn:last()]',if >> the resulting sequence is very long(e.g. 1,000,000 >> nodes) and the data is stored in a file,in the worst >> case,we'll have to use 1,000,000 IO operations to get >> the value of fn:last() and another 1,000,000 IO >> operations to iterate to the last node. If the data is stored in a file, then you need to parse the entire file anyway to ensure that it's valid XML and to ensure that you've found all matching nodes. However, this particular query (if the elided predicates don't get in the way) shouldn't require processing the file twice. Just remember the last matching e4 element within each matching e3, and when the e3 close is reached, emit the last e4 you remembered. In general, predicates of the form [last()] (and possibly [last() - k] for constant k) should be optimized. Obviously there are plenty of other queries, like //a[last() > position() * b] that will perform badly in most implementations no matter what kinds of performance optimizations you make. That's just the nature of the game. Cheers, Michael Brundage xquery@comcast.net Author, XQuery: The XML Query Language (Addison-Wesley, 2004) Co-author, Professional XML Databases (Wrox Press, 2000)
Received on Sunday, 6 June 2004 19:08:49 UTC