Re: Proposed EXPath module: resource collections from Hans-Juergen Rennau on 2015-02-20 (public-expath@w3.org from February 2015)

From: Hans-Juergen Rennau <hrennau@yahoo.de>
Date: Fri, 20 Feb 2015 21:57:57 +0000 (UTC)
To: Michael Kay <mike@saxonica.com>
Cc: "jonathan.robie@emc.com" <jonathan.robie@emc.com>, "ndw@nwalsh.com" <ndw@nwalsh.com>, "christian.gruen@gmail.com" <christian.gruen@gmail.com>, "public-expath@w3.org" <public-expath@w3.org>, "msokolov@gmail.com" <msokolov@gmail.com>
Message-ID: <323489538.5419328.1424469477075.JavaMail.yahoo@mail.yahoo.com>

Thank you for this sketch. I would like to make a suggestion, based on first implementation experience.
If the collection URI is in fact the URI of the collection descriptor document, everything is simple. URI resolving is standard (= resolving a document URI). The API implementation reads the descriptor and thus learns everything required to deal with the collection. In particular, it learns the technology used (XML, SQL, NOSQL, ...), the property names, types and expressions and any configuration parameters (e.g. RDBMS and connection data) additionally needed in order to create, update and query the collection artifacts. From the user perspective, simplicity and convenience, too: design the collection by writing a descriptor document (a matter of minutes), then use API functions for creating the collection artifacts, inserting and updating content and filtering it. Besides, portability is implied by the fact that the creation and updating of artifacts is guided by the descriptor - processor A can use the collections produced by processor B. (Provided, of course, both implementers can agree on a descriptor vocabulary and the rules of its interpretation.) A simple descriptor example is shown below.

Hans-Juergen
<nodl xmlns="http://www.infospace.org/pcollection">
  <collection name="xmls" uri="" formats="xml" doc="A collection of arbitrary XML documents."/>
  <pmodel>
    <property name="name" type="xs:string" maxLength="100" expr="local-name(/*)"/>
    <property name="namespace" type="xs:string" maxLength="100" expr="namespace-uri(/*)"/>
    <property name="namespaces" type="xs:string*" maxLength="100" expr="distinct-values(//*/namespace-uri(.))"/>    
    <property name="enames" type="xs:string+" maxLength="100" expr="distinct-values(//*/local-name(.))"/>
    <property name="anames" type="xs:string*" maxLength="100" expr="distinct-values(//@*/local-name(.))"/>
    <property name="ecount" type="xs:integer" maxLength="100" expr="count(//*)"/>
    <property name="acount" type="xs:integer" maxLength="100" expr="count(//@*)"/>
  </pmodel>
  <nodeDescriptor kind="uri"/>
  <ncatModel>
    <xmlNcat documentURI="/ncats/ncat-xmls.xml" asElement="*"/>
  </ncatModel>
</nodl>

 

     Michael Kay <mike@saxonica.com> schrieb am 16:25 Freitag, 20.Februar 2015:
   

 

> 
> Very well, let us put portability out of scope. But still I have no clue how to map collections of resource descriptors to persistent artifacts without a scheme, if efficiency is important.
> 

In Saxon I would imagine it working much like the collection() function does. The user registers a CollectionURIResolver or uses the standard one by default. The CollectionURIResolver is supplied with a URI and returns a map containing a list of resources and their properties. If the system supports different kinds of collection (collection schemes?) then the CollectionURIResolver will have to do some magic to determine which scheme is in use (perhaps by pattern matching on the collection URI; or perhaps using the URI scheme directly, e.g. WebDAV). The map that is returned can be anything that implements the Map interface: it might build an actual in-memory data structure containing the keys and values representing the list of resources, or it might be a virtual map that gets its information on demand from some backing data, e.g. a SQL database or a WebDAV server or an XML catalog of some kind.

Michael Kay
Saxonica

Received on Friday, 20 February 2015 21:58:29 UTC