[Bug 2553] [F+O] Stability of collection()

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2553

           Summary: [F+O] Stability of collection()
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Functions and Operators
        AssignedTo: ashok.malhotra@oracle.com
        ReportedBy: mike@saxonica.com
         QAContact: public-qt-comments@w3.org


A while back I introduced an implementation of collection() in Saxon that uses a
URI to identify a set of XML documents in filestore, with the ability to do
pattern matching on the file names, recursively traverse the directory
structure, and so on. This has proved very popular with XSLT users in
particular: for example it allows you to build an index over a large set of
source documents. The problem is that it isn't stable: if you call collection()
again with the same URI, and files have been created or deleted, you will get a
different result the next time. I've tried various devices to get around this
problem, but the only conformant solution I can think of is to abandon using the
collection() function for this purpose and introduce a proprietary extension
function instead, which doesn't seem to be in anyone's interests.

As far as I can tell there are only two ways of making the collection() function
stable. One is to lock the stored collection against updates for the duration of
the query or transformation. This is only possible where you have exclusive
access to the data, it's not a practical solution for files in filestore. The
other approach is to take a snapshot of the entire collection. But that's
hideously expensive, given that the collection will usually be too big to fit in
memory, and that the chances are that 99% of the time it will only be read once,
often with each document being processed to completion before the next one is
examined.

So I think there's a strong case for relaxing the requirement that collection()
should be stable. David Carlisle made an interesting suggestion: one could
define the semantics so that collection() is guaranteed to create new nodes
rather than return existing nodes. Since our processing model already allows a
function to create new nodes each time it is called, this shouldn't be problematic. 

Of course for XQuery scenarios involving a database that might be updated, one
does want a reference to the existing node, which suggests a need for two
options or modes.

Michael Kay

Received on Saturday, 26 November 2005 08:43:11 UTC