Re: [XPath] About fn:last() from Fang Yidong on 2004-06-07 (public-qt-comments@w3.org from June 2004)

From: Fang Yidong <fangyidong@yahoo.com.cn>
Date: Mon, 7 Jun 2004 10:48:24 +0800 (CST)
To: Michael Brundage <xquery@comcast.net>, Michael Kay <mhk@mhk.me.uk>, XQuery Public Comments <public-qt-comments@w3.org>
Message-ID: <20040607024824.91844.qmail@web15409.mail.cnb.yahoo.com>

Note:I've send the original message to massino
wrongly.Now I learn that massimo has a very high work
load,so please don't reply my message to
'massimo@w3.org' again.

 --- Michael Brundage <xquery@comcast.net> wrote:>
Although it's true that last() has significant
> performance implications,
> consider even simple XPaths like /a/b[/a/c] or /a[b
> = /a/c].  Because you
> can express cross-products, XPath queries can easily
> be O(N^2) or worse
> unless you use indexing techniques (as Michael Kay
> replied)
> 
> >>
>
'/e1[predicate1]/e2[predicate2]/e3[predicate3]/e4[fn:last()]',if
> >> the resulting sequence is very long(e.g.
> 1,000,000
> >> nodes) and the data is stored in a file,in the
> worst
> >> case,we'll have to use 1,000,000 IO operations to
> get
> >> the value of fn:last() and another 1,000,000 IO
> >> operations to iterate to the last node.
> 
> If the data is stored in a file, then you need to
> parse the entire file
> anyway to ensure that it's valid XML and to ensure
> that you've found all
> matching nodes.

  My implementation is for xdb purpose,and in many
cases,the resulting sequence will be very large.An xml
file is stored in a native format and is pre-validated
before it is stored.

> 
> However, this particular query (if the elided
> predicates don't get in the
> way) shouldn't require processing the file twice. 
> Just remember the last
> matching e4 element within each matching e3, and
> when the e3 close is
> reached, emit the last e4 you remembered.  In
> general, predicates of the
> form [last()] (and possibly [last() - k] for
> constant k) should be
> optimized.
> 
> Obviously there are plenty of other queries, like
>    //a[last() > position() * b]
> that will perform badly in most implementations no
> matter what kinds of
> performance optimizations you make.  That's just the
> nature of the game.
> 

  I have misunderstood the scope of the axis.Now I've
learned that it is just relative to the context item:
'/e1/e2[last()]' is to get the last 'e2' item for
*each* 'e1',not the last 'e2' item for *all* 'e1'.(May
be the WG should add an explicit statement about
this?)It seems it's easier to implement than I have
thought.

  As Michael Kay said,the XQuery is designed for
users,not for implementators,so it is the
implementor's challenge to find an effective way to
process the query.Maybe this is a good news to the
users,and maybe not: Now a user may need to know more 
details about an implementation to make clear what
predicate can be optimized (like '[last()-1024]') and
what predicate can not (like '[last()-1025]' and
'[last() > position() * b]').

  At last,I suggest to add a function 'fn:islast()
boolean' which returns true if the current item is the
last item,or false if not.

_________________________________________________________
Do You Yahoo!? 
嫌邮箱太小？雅虎电邮自助扩容！
http://cn.rd.yahoo.com/mail_cn/tag/10m/*http://cn.mail.yahoo.com/event/10m.html

Received on Sunday, 6 June 2004 22:48:58 UTC