Re: Proposed EXPath module: resource collections from Hans-Juergen Rennau on 2015-02-20 (public-expath@w3.org from February 2015)

From: Hans-Juergen Rennau <hrennau@yahoo.de>
Date: Fri, 20 Feb 2015 06:34:43 +0000 (UTC)
To: Michael Kay <mike@saxonica.com>
Cc: "jonathan.robie@emc.com" <jonathan.robie@emc.com>, "ndw@nwalsh.com" <ndw@nwalsh.com>, "christian.gruen@gmail.com" <christian.gruen@gmail.com>, "public-expath@w3.org" <public-expath@w3.org>, "msokolov@gmail.com" <msokolov@gmail.com>
Message-ID: <1409665612.3919268.1424414083972.JavaMail.yahoo@mail.yahoo.com>

Thank you, these were very helpful explanations. Now I understand that the fetch() function is more an aspect of the API - how to represent the resource descriptor and how to enable subsequent resource retrieval - and less a question of the data model of resource descriptors.The data model (defining the information content of resource descriptors in general terms) has not yet been touched, and that would be the place where to discuss the data representation of the association between descriptor and node.
For me, there remains one major open question - how to express collection-specific models of resource descriptors, and I suggest the following requirements:
* an artifact which is accessible to the XQuery user* a format which is platform and XQuery processor independent
Advantages: the user can himself design and manage the collections (if appropriate API functions are provided); portability across XQuery processors.
How about a little XML vocabulary? Alternatives? Would you expect it to be hard to agree upon the details?
Hans-Juergen

     Michael Kay <mike@saxonica.com> schrieb am 23:58 Donnerstag, 19.Februar 2015:

 > 
> One question is - what is the chief value of rc:resource-collection (present or future version) - delivery of resource descriptors, or delivery of resources referenced by the resource descriptor?

The history here is that we had collection() which had two limitations: (a) it can only return nodes (in practice that usually means XML documents) as distinct from other kinds of resource such as text and binary files and JSON files and (b) it cannot return metadata about the resources which can be used to achieve selective retrieval of the resources based on their metadata.

We introduced uri-collection() first in XSLT 3.0 and then in XPath 3.1 to try and solve these problems, for example you can get a set of URIs, filter it to select those ending in ".txt", and then use unparsed-text() to retrieve those resources. But this isn't general enough; there's still no metadata apart from the URI itself, and no way of discovering what kind of resource it is and how it should be parsed.

So the value, compared to what's in the standard, is to provide more metadata (resource descriptors if you like), but in most cases this is only useful as a means to an end, where the end is successful retrieval of the resource itself.

> I vote for the second, as I regard the resource descriptors not so much as an augmentation of the resource, but as a means to find the resource (like an index is a means, although occasionally you want to see it itself), and therefore I favor a variant delivering the resources themselves, rather than maps. Doesn't the following signature make things simpler for both, the user and the implementer alike:
>    collection($uri, $filter) : resource-type*

Yes, the challenge is how to provide the filter.
> 
> You wrote: "If the content of the resource is just the value of one of the properties, I think some implementations may have difficulty fetching the data only when it is actually needed."
> Oh! I thought it a fairly general and safe assumption that any resource should be constructable from one string - either a reference (URI or proprietary) or the serialized resource. But you think this should not be assumed. Could you give me a counter example?

I think you may have misunderstood me. If the resource descriptor is a map $R, then the question is whether to make the resource content available as a map property $R?content, or via a function $R?fetch(). In some sense these are equivalent. However, I've been assuming that we would usually want to get the content only after looking at the metadata, and if both are properties of the same map, then presenting both as "data properties" would require a specialist map implementation, whereas if the content is made available by a function, a conventional implementation with no special tricks would achieve the desired effect that the content is retrieved only on demand.
> 
> You wrote: "I think we should steer completely clear of models that use nodes. Nodes in XDM carry far too much baggage - names, namespaces, 13 axes, identity semantics, base URIs etc."
> I am at a loss, as I believe that node trees are of overwhelming importance when working with XQuery, no? I want two separate functions for retrieving XML and not-XML resources (fn:doc vs. fn:unparsed-text), and I want two separate functions for retrieving filtered node collections and filtered resources in general. I am not sure if you disagree, or perhaps meant something else by saying "steering clear of models that use nodes"?
> 

Nodes (XML documents) are important as resources that we want to retrieve, but I don't think they are a good way to model the metadata of collections. And I don't think we should have parallel sets of methods for handling XML versus other resources (except to the extent that we have to live with what we've got); we're trying here to generalize the model so it's not limited to handling a finite set of media types built in to the language.

Michael Kay
Saxonica

Received on Friday, 20 February 2015 06:35:16 UTC