Proposed EXPath module: resource collections

Three inputs in the last week all point in the same direction:

Hans-Juergen Rennau gave a paper in XML Prague ("Node search preceding node construction") about how to define collections of resources with properties allowing them to be filtered and selected before they are actually parsed for querying.

The XQuery WG discussed how to model heterogeneous collections of resources including for example XML documents, JSON documents, and binary documents, and how to extend or supplement the collection() function to process such sets of resources.

XProc 2.0 has some (very basic, currently) model that document nodes have properties that are external to the document content (document URI, last modified date, etc) which should be made available to XProc applications.

This set me thinking that it would not be very difficult to do something very useful in EXPath in this area. I'm thinking of something less elaborate than Hans-Juergen's model, but general enough to achieve similar levels of capability by layering things on top.

As a basic model, the idea is that we have an object called a "resource collection" identified by a URI. A resource collection is a set of resources, each modelled as a map containing key-value pairs representing properties of the resources in the collection. The keys that are present in the map may vary from one kind of resource to another, but we will define some commonly used property names for use when information is available. For example:

resource-uri - a context-free URI identifying the resource
name - local name of the resource within the collection
media-type - the MIME type of the resource
extension - a part of the name of the resource conventionally used to identify its type
created - dateTime of the original resource creation
last-modified - dateTime of last modification of the resource
fetch - a zero-arity function that can be called to deliver an XDM item representing the content of the resource in a way appropriate to its media type
is-collection - a boolean indicating whether this resource is itself a collection, in which case the fetch() function returns the sequence of maps representing that collection
schema - the uri of a schema against which the resource is intended to be valid
owner - identifier of a person or other entity owning the resource

A function resource-collection($uri) returns the sequence of maps representing a collection.

We can then use XPath 3.1 facilities to filter this sequence of maps. For example

rc:resource-collection('coll-uri')[?media-type = 'appllication/json' and ?last-modified gt xs:dateTime('2012-01-01T01:01:01')]?fetch()

selects the JSON resources modified since a certain date, and parses them using a JSON parser to deliver a sequence typically of maps or arrays (depending on the JSON content). (The parsing step of course is unnecessary if the collection holds the JSON resources in pre-parsed form).

We could consider defining a mapping from this abstract concept of a resource collection to certain concrete kinds of collection, e.g. a directory of unparsed files in filestore, or a WebDAV collection.

We could consider impure functions to add, remove, or replace resources within a collection.

We could consider a variant of the fetch() function that takes a map giving parsing options, e.g. whether to validate, what to do on error, etc.

Any interest?


Michael Kay
Saxonica
mike@saxonica.com
+44 (0) 118 946 5893

Received on Monday, 16 February 2015 23:01:51 UTC