RE: Proposal: Searching XML from Matthew Dovey on 2002-04-21 (www-zig@w3.org from April 2002)

From: Matthew Dovey <matthew.dovey@las.ox.ac.uk>
Date: Sun, 21 Apr 2002 13:53:14 +0100
To: "Sebastian Hammer" <quinn@indexdata.dk>, <www-zig@w3.org>
Message-ID: <1BE993D0525EB945BF05127E705C97064A6C96@MERCURYXULIB.ulib.ox.ac.uk>
My initial reaction about this is that XPath and XML Query both assume
that the space you are using for searching is also the space you are
using for retrieving the records - i.e. is very much from the SQL camp.

Z39.50 on the other hand abstracts the query from the retrieval.

In order to model the Z39.50 abstraction in XML terms, you actually need
two XML structures - one containing the XML nodes used during the
search, the second using the nodes use during the present and some
correspondence between the two trees. The first XML tree would conform
to some public standard and as Rob Saunderson indicates may not be much
more than a flat tree that lists the access points.

However, XPath also specifies not only the query but also the
information returned i.e. also forms the role of the e-specs during a
present. Again this doesn't fit well with the Z39.50 model of seperating
the two. In the above two XML tree view of Z39.50 the query components
of the XPath map onto the first tree whilst the parts sepcifying the
record to return map onto the second tree.

On the URI issue, however, we have a easy mechanical way of generating
those from the OIDs namely a URI of the form urn:z3950-odi:OID (or
similar).

Matthew

> -----Original Message-----
> From: Sebastian Hammer [mailto:quinn@indexdata.dk]
> Sent: 20 April 2002 22:46
> To: www-zig@w3.org
> Subject: Proposal: Searching XML
> 
> 
> Hi,
> 
> This may have been discussed both in the plenary and in
> various subgroups, 
> but if it has been proposed formally, I have missed it (if 
> so, apologies in 
> advance). Anyway...
> 
> I would like to propose that the ZIG decides upon a convention for
> modelling the potential set of searchable access points (within an 
> application domain or profile) using XPATH Path Expressions. 
> An example 
> could be a domain-specific attribute set which defines 
> searchable access 
> points as [a subset of] all possible Path expressions that 
> identify data 
> elements within a given schema.
> 
> EXAMPLE:
> 
> Given a database record like this:
> 
> <book>
> 	<title>The catcher in the Rye</title>
> 	<author>J.D. Salinger</author>
> 	<subject vocabulary="LCSH">Fiction</subject>
> </book>
> 
> One might like to pose a query like:
> 
> Find the word "catcher" in the field matching "title".
> 
> or:
> 
> Find the word "fiction" in fields matching
> "subject[@vocabulary='LCSH']"
> 
> Technically, the Path Expression would have to go into the
> "Complex" branch 
> of the attributeValue, as a single string value, thus 
> requiring support for 
> version 3 of the protocol. I suggest that a similar mechanism 
> be defined 
> for SRW/U if it is not already in place. But do note that in 
> the first 
> round, there's no requirement to go beyond the current 
> definition of the 
> Type-1 query.
> 
> RATIONALE:
> 
> Outside of the library domain, Z39.50 is sometimes employed
> to support the 
> networked IR requirements of different information domains. 
> While in some 
> cases, interoperability with libraries are a specific 
> requirement, this is 
> not always the case. Further, in more cases than not, the 
> native, shared 
> data models are already expressed in terms of XML. I suggest 
> that in many 
> cases where people consider Z39.50 for their application, it is an 
> inhibiting factor that people feel forced to munge their 
> existing data 
> model into one that is suitable to Z39.50.. either by mapping 
> elements 
> (more or less) to Bib-1 attributes, or by declaring new, flat sets of 
> numerical USE attributes... in either case, a process which 
> introduces 
> needless complexity in the documentation and maintenance of the 
> domain-specific profile.
> 
> The attribute set architecture already provides a mechanism
> for expressing 
> searches for hierarchically nested elements which can be seen 
> as a subset 
> of XPATH. However, its primary drawbacks are that the mechanism is 
> comparatively unfamiliar to people versed in XML techniques, 
> and it is less 
> expressive than XPATH (for instance, constraints on attribute 
> values, as 
> above, are probably not uncommon, yet have no clear parallel 
> in the AA or 
> XD-1).
> 
> Of course there's nothing stopping a profile author from
> defining abstract 
> access points which don't correspond directly to individual 
> fields (such as 
> an "ANY" attribute, or database record metadata). Similarly, there's 
> nothing stopping a profile from defining a crosswalk to existing, 
> conventionel attribute sets and requiring support for these.
> 
> We have already embraced XML as a bona fide record syntax
> (eg. in the Bath 
> profile). I think the next logical step is to also allow 
> searches to be 
> expressed in terms comfortable to the wider community -- without 
> sacrificing the power of the Type-1 query.
> 
> WHY BOTHER?
> 
> 1) Because, by softening some of the library-traditional ways
> of using 
> Z39.50 and showing how it can be used easily without bending 
> your existing 
> data model out of shape, we can help break down the 
> misconception that it's 
> not XML-friendly. Will it "sell" better? It still won't win 
> the world, but 
> you can't miss the fact that the W3C *still* doesn't have a 
> suitable IR 
> protocol. Maybe the slot is still open.
> 
> 2) Because it's a natural progression from the move to
> support XML as a 
> retrieval syntax and even from the thinking behind SRW.
> 
> 3) Because it is happening anyway. I believe there's several
> ZIG members 
> who make search engines which are natively oriented around 
> XML-like data 
> models, and who market their products to a broader range of products. 
> Speaking for ourselves, we *need* to do this, or else abandon Z39.50 
> completely, and I suspect others may be in the same position. 
> Why not do it 
> in an interoperable way?
> 
> FUTURE STEPS?
> 
> It'd be useful to consider extending the Type-1 query to
> allow the use of 
> an URI instead of an OID to denote the "attribute set". You will most 
> likely want to mix and match the two type of identifiers -- 
> there's no 
> clear reason to abandon the Utility attribute set for the 
> non-USE attributes.
> 
> Cheers,
> 
> --Sebastian
> --
> Sebastian Hammer, Index Data <http://www.indexdata.dk/>
> Ph: +45 3341 0100, Fax: +45 3341 0101
> 
>
Received on Sunday, 21 April 2002 08:53:16 UTC