- From: Fang Yidong <fangyidong@yahoo.com.cn>
- Date: Mon, 7 Jun 2004 10:48:24 +0800 (CST)
- To: Michael Brundage <xquery@comcast.net>, Michael Kay <mhk@mhk.me.uk>, XQuery Public Comments <public-qt-comments@w3.org>
Note:I've send the original message to massino wrongly.Now I learn that massimo has a very high work load,so please don't reply my message to 'massimo@w3.org' again. --- Michael Brundage <xquery@comcast.net> wrote:> Although it's true that last() has significant > performance implications, > consider even simple XPaths like /a/b[/a/c] or /a[b > = /a/c]. Because you > can express cross-products, XPath queries can easily > be O(N^2) or worse > unless you use indexing techniques (as Michael Kay > replied) > > >> > '/e1[predicate1]/e2[predicate2]/e3[predicate3]/e4[fn:last()]',if > >> the resulting sequence is very long(e.g. > 1,000,000 > >> nodes) and the data is stored in a file,in the > worst > >> case,we'll have to use 1,000,000 IO operations to > get > >> the value of fn:last() and another 1,000,000 IO > >> operations to iterate to the last node. > > If the data is stored in a file, then you need to > parse the entire file > anyway to ensure that it's valid XML and to ensure > that you've found all > matching nodes. My implementation is for xdb purpose,and in many cases,the resulting sequence will be very large.An xml file is stored in a native format and is pre-validated before it is stored. > > However, this particular query (if the elided > predicates don't get in the > way) shouldn't require processing the file twice. > Just remember the last > matching e4 element within each matching e3, and > when the e3 close is > reached, emit the last e4 you remembered. In > general, predicates of the > form [last()] (and possibly [last() - k] for > constant k) should be > optimized. > > Obviously there are plenty of other queries, like > //a[last() > position() * b] > that will perform badly in most implementations no > matter what kinds of > performance optimizations you make. That's just the > nature of the game. > I have misunderstood the scope of the axis.Now I've learned that it is just relative to the context item: '/e1/e2[last()]' is to get the last 'e2' item for *each* 'e1',not the last 'e2' item for *all* 'e1'.(May be the WG should add an explicit statement about this?)It seems it's easier to implement than I have thought. As Michael Kay said,the XQuery is designed for users,not for implementators,so it is the implementor's challenge to find an effective way to process the query.Maybe this is a good news to the users,and maybe not: Now a user may need to know more details about an implementation to make clear what predicate can be optimized (like '[last()-1024]') and what predicate can not (like '[last()-1025]' and '[last() > position() * b]'). At last,I suggest to add a function 'fn:islast() boolean' which returns true if the current item is the last item,or false if not. _________________________________________________________ Do You Yahoo!? ÏÓÓÊÏä̫С£¿ÑÅ»¢µçÓÊ×ÔÖúÀ©ÈÝ£¡ http://cn.rd.yahoo.com/mail_cn/tag/10m/*http://cn.mail.yahoo.com/event/10m.html
Received on Sunday, 6 June 2004 22:48:58 UTC