Re: Proposed EXPath module: resource collections

Thank you very much, Michael. 

One question is - what is the chief value of rc:resource-collection (present or future version) - delivery of resource descriptors, or delivery of resources referenced by the resource descriptor? I vote for the second, as I regard the resource descriptors not so much as an augmentation of the resource, but as a means to find the resource (like an index is a means, although occasionally you want to see it itself), and therefore I favor a variant delivering the resources themselves, rather than maps. Doesn't the following signature make things simpler for both, the user and the implementer alike:   collection($uri, $filter) : resource-type*

But of course one could, perhaps should, have both variants.
I understand your concern about a query string, it is a new entity to be agreed upon etc., and your inclination to reuse existing constructs like expressions and functions. The concept of a query string certainly has merits and drawbacks. (I myself appreciate the simplicity and abstraction which it provides - a string which can be received from user input and passed around until it arrives in the implementation of the filtering engine.)
Thank you for clarifying that those generic properties you regard only as part of the resource descriptor, usually to be accompanied by more specific properties. But how to define and communicate this specific part of the resource descriptor? I believe this should not be *application-level*, but *collection-level* information. My idea is to have a *collection descriptor*, which defines those specific properties in terms of name, type and XQuery expression. Can you imagine such an approach, or do you think of something else?
You wrote: "If the content of the resource is just the value of one of the properties, I think some implementations may have difficulty fetching the data only when it is actually needed."Oh! I thought it a fairly general and safe assumption that any resource should be constructable from one string - either a reference (URI or proprietary) or the serialized resource. But you think this should not be assumed. Could you give me a counter example?

You wrote: "I think we should steer completely clear of models that use nodes. Nodes in XDM carry far too much baggage - names, namespaces, 13 axes, identity semantics, base URIs etc."I am at a loss, as I believe that node trees are of overwhelming importance when working with XQuery, no? I want two separate functions for retrieving XML and not-XML resources (fn:doc vs. fn:unparsed-text), and I want two separate functions for retrieving filtered node collections and filtered resources in general. I am not sure if you disagree, or perhaps meant something else by saying "steering clear of models that use nodes"?
Hans-Juergen




     Michael Kay <mike@saxonica.com> schrieb am 23:56 Mittwoch, 18.Februar 2015:
   

 > 
> “do not equate resource descriptors with XDM map items”
> ============================================
> 
A map is a completely abstract entity, it can have any number of possible implementations. Saying it is an XDM map is purely a statement that it satisifes the map interface, for example the ability to enumerate the keys and to get the value associated with any key. This interface could easily be provided by, for example, mapping these functions to SQL queries. So I don't think you need an abstraction separate from map items, the map interface is already abstract enough.
>  
> 
> The filter should be expressed by a generic query syntax (like “a=1 and b=x”) which is behind the scenes translated into a construct appropriate for the technology actually used – e.g. a predicate applied to a sequence of maps, or a predicate applied to an XML document, or a SQL SELECT, etc. etc.

I don't think there is any need for this generic query syntax to be something different from XPath syntax, but I do recognize the concern that if it's done by filtering the result of a collection() call by a predicate, the only way you can ship the query to be closer to the data is by digging into the XPath implementation/optimizer, which makes it less likely to happen.

Saxon of course uses query parameters as part of the URI, e.g. http://my/collection?select=*.xml. That's meets the need to ship the query to the data, but it tends to restrict the query to something very simple. We could of course provide a function

collection($uri, $filter)

where $filter is a predicate consisting of an XPath expression (perhaps some XPath subset) in the form of a string. But then supplying parameters to the string gets messy.

I think it has also been suggested to use such an interface with $filter being a function that does the selection, but that's also difficult to ship to a server in the general case.

>  
> “collection-specific models of resource properties”
> ======================================
> It is difficult for me to imagine important scenarios in which a collection with resource descriptors restricted to generic properties is very interesting.

I certainly wasn't suggesting that, sorry if I gave that impression. The idea was to have a completely open set of properties, just standardising the names we use for properties that many collection implementations are likely to provide.

> 
>  “make resource reference purely data based”
> ===================================
> I believe the relationship of a resource descriptor to a resource should be captured by data, rather than a function, ensuring simplicity and portability.

If the content of the resource is just the value of one of the properties, I think some implementations may have difficulty fetching the data only when it is actually needed. I think a function is a lot more flexible, but of course input on that from other implementors is welcome.

> Proposed concept: a node constructor, consisting of two strings, one identifying the type of constructor (typically “URI” or “serialized node”), the other providing the value from which the node can be constructed. So the node constructor is just another name/value pair (name=type, value=value), accompanied by the name/value pairs representing resource properties. Adhering to conventions concerning the used name, the function (“fetch”) becomes superfluous. Data is sufficient and simple.

I think we should steer completely clear of models that use nodes. Nodes in XDM carry far too much baggage - names, namespaces, 13 axes, identity semantics, base URIs etc. At the same time they are very limited - to create a node whose value is a sequence of integers, you need a schema. Maps free us from all that.
>  
> “collection filtering as atomic operation”
> ==============================
> Remembering the goal to deal with huge collections, the filtering should be pulled into the collection access function, rather than being kept separate and applied afterwards. At any rate, it seems to me that the implementation of a two-parameter function
> 
>    filteredCollection(‘coll-uri’, ‘filter-descriptor’)
>  
> is so much easier, given the task to extract from 20 GB a result of 10 MB, compared to a rc:resource-collection(…) followed by a predicate.
> 
Agreed, as already said, we do need to consider how to ensure that filter conditions can be shipped to the server holding the data.

Mike


   

Received on Thursday, 19 February 2015 22:25:36 UTC