Re: Proposed EXPath module: resource collections

> 
> “do not equate resource descriptors with XDM map items”
> ============================================
> 
A map is a completely abstract entity, it can have any number of possible implementations. Saying it is an XDM map is purely a statement that it satisifes the map interface, for example the ability to enumerate the keys and to get the value associated with any key. This interface could easily be provided by, for example, mapping these functions to SQL queries. So I don't think you need an abstraction separate from map items, the map interface is already abstract enough.
>  
> 
> The filter should be expressed by a generic query syntax (like “a=1 and b=x”) which is behind the scenes translated into a construct appropriate for the technology actually used – e.g. a predicate applied to a sequence of maps, or a predicate applied to an XML document, or a SQL SELECT, etc. etc.

I don't think there is any need for this generic query syntax to be something different from XPath syntax, but I do recognize the concern that if it's done by filtering the result of a collection() call by a predicate, the only way you can ship the query to be closer to the data is by digging into the XPath implementation/optimizer, which makes it less likely to happen.

Saxon of course uses query parameters as part of the URI, e.g. http://my/collection?select=*.xml. That's meets the need to ship the query to the data, but it tends to restrict the query to something very simple. We could of course provide a function

collection($uri, $filter)

where $filter is a predicate consisting of an XPath expression (perhaps some XPath subset) in the form of a string. But then supplying parameters to the string gets messy.

I think it has also been suggested to use such an interface with $filter being a function that does the selection, but that's also difficult to ship to a server in the general case.

>  
> “collection-specific models of resource properties”
> ======================================
> It is difficult for me to imagine important scenarios in which a collection with resource descriptors restricted to generic properties is very interesting.

I certainly wasn't suggesting that, sorry if I gave that impression. The idea was to have a completely open set of properties, just standardising the names we use for properties that many collection implementations are likely to provide.

> 
>  “make resource reference purely data based”
> ===================================
> I believe the relationship of a resource descriptor to a resource should be captured by data, rather than a function, ensuring simplicity and portability.

If the content of the resource is just the value of one of the properties, I think some implementations may have difficulty fetching the data only when it is actually needed. I think a function is a lot more flexible, but of course input on that from other implementors is welcome.

> Proposed concept: a node constructor, consisting of two strings, one identifying the type of constructor (typically “URI” or “serialized node”), the other providing the value from which the node can be constructed. So the node constructor is just another name/value pair (name=type, value=value), accompanied by the name/value pairs representing resource properties. Adhering to conventions concerning the used name, the function (“fetch”) becomes superfluous. Data is sufficient and simple.

I think we should steer completely clear of models that use nodes. Nodes in XDM carry far too much baggage - names, namespaces, 13 axes, identity semantics, base URIs etc. At the same time they are very limited - to create a node whose value is a sequence of integers, you need a schema. Maps free us from all that.
>  
> “collection filtering as atomic operation”
> ==============================
> Remembering the goal to deal with huge collections, the filtering should be pulled into the collection access function, rather than being kept separate and applied afterwards. At any rate, it seems to me that the implementation of a two-parameter function
> 
>     filteredCollection(‘coll-uri’, ‘filter-descriptor’)
>  
> is so much easier, given the task to extract from 20 GB a result of 10 MB, compared to a rc:resource-collection(…) followed by a predicate.
> 
Agreed, as already said, we do need to consider how to ensure that filter conditions can be shipped to the server holding the data.

Mike

Received on Wednesday, 18 February 2015 22:56:56 UTC