- From: <bugzilla@wiggum.w3.org>
- Date: Sat, 26 Nov 2005 08:43:03 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=2553 Summary: [F+O] Stability of collection() Product: XPath / XQuery / XSLT Version: Candidate Recommendation Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: Functions and Operators AssignedTo: ashok.malhotra@oracle.com ReportedBy: mike@saxonica.com QAContact: public-qt-comments@w3.org A while back I introduced an implementation of collection() in Saxon that uses a URI to identify a set of XML documents in filestore, with the ability to do pattern matching on the file names, recursively traverse the directory structure, and so on. This has proved very popular with XSLT users in particular: for example it allows you to build an index over a large set of source documents. The problem is that it isn't stable: if you call collection() again with the same URI, and files have been created or deleted, you will get a different result the next time. I've tried various devices to get around this problem, but the only conformant solution I can think of is to abandon using the collection() function for this purpose and introduce a proprietary extension function instead, which doesn't seem to be in anyone's interests. As far as I can tell there are only two ways of making the collection() function stable. One is to lock the stored collection against updates for the duration of the query or transformation. This is only possible where you have exclusive access to the data, it's not a practical solution for files in filestore. The other approach is to take a snapshot of the entire collection. But that's hideously expensive, given that the collection will usually be too big to fit in memory, and that the chances are that 99% of the time it will only be read once, often with each document being processed to completion before the next one is examined. So I think there's a strong case for relaxing the requirement that collection() should be stable. David Carlisle made an interesting suggestion: one could define the semantics so that collection() is guaranteed to create new nodes rather than return existing nodes. Since our processing model already allows a function to create new nodes each time it is called, this shouldn't be problematic. Of course for XQuery scenarios involving a database that might be updated, one does want a reference to the existing node, which suggests a need for two options or modes. Michael Kay
Received on Saturday, 26 November 2005 08:43:11 UTC