RE: Proposal: Searching XML

At 13:53 21-04-2002 +0100, Matthew Dovey wrote:

>My initial reaction about this is that XPath and XML Query both assume
>that the space you are using for searching is also the space you are
>using for retrieving the records - i.e. is very much from the SQL camp.
>
>Z39.50 on the other hand abstracts the query from the retrieval.
>
>In order to model the Z39.50 abstraction in XML terms, you actually need
>two XML structures - one containing the XML nodes used during the
>search, the second using the nodes use during the present and some
>correspondence between the two trees. The first XML tree would conform
>to some public standard and as Rob Saunderson indicates may not be much
>more than a flat tree that lists the access points.

Agreed. In Z39.50, I would accept it as a given that the schema you base 
your search on is not necessarily the same as the one you base your 
retrieval on. For instance, there's nothing stopping you from executing a 
search using Bib-1 attributes and subsequently retrieving records in XML 
using Dublin Core elements (Bath does this). The reverse is equally 
possible and sensible: You can define and use an attribute set which 
defines a set of access points which are valid XPATH path expressions 
identifying elements in a Dublin Core-based schema, and subsequently 
retrieve those records using MARC21. That is a good and sound principle of 
Z39.50.

Now, in *some* cases, and this is primarily the ones my proposal is aimed 
at, the data model for search and retrieval are indeed the same (or mostly 
the same). Z39.50 allows you to maintain different address spaces for 
search & retrieval. It does not, in fact, force you to do so. I'm 
contending that to some people, the split is unintuitive and may even 
inhibit them from adopting the standard. The proposed method is meant to 
allow them to use the same, familiar vocabulary for both... even library 
people have sometimes had this desire, hence the work a while back on the 
MARC attribute set.

>However, XPath also specifies not only the query but also the
>information returned i.e. also forms the role of the e-specs during a
>present. Again this doesn't fit well with the Z39.50 model of seperating
>the two. In the above two XML tree view of Z39.50 the query components
>of the XPath map onto the first tree whilst the parts sepcifying the
>record to return map onto the second tree.

I certainly don't think the use of XPATH in the query has to do anything 
with the subsequent retrieval of records. What the practice amounts to is 
to say, "give me those records in which the element(s) retrieved by THIS 
XPATH match THAT TERM"... there is no assumption that the node sets 
actually returned by the XPATHs are retained, or even that a formal XPATH 
processor be in place on the server side. It is strictly a formal way of 
identifying elements.

The trick is to limit the set of XPATHs to something meaningful and 
implementable across the set of servers relevant in a given application... 
but that is the work of the attribute set authors and the profile 
authors... after all, even using Bib-1 it's easy enough to pose queries 
that are impossible to honor by the average Z39.50 server.

--Sebastian
--
Sebastian Hammer, Index Data <http://www.indexdata.dk/>
Ph: +45 3341 0100, Fax: +45 3341 0101

Received on Sunday, 21 April 2002 17:51:51 UTC