- From: Sebastian Hammer <quinn@indexdata.dk>
- Date: Sat, 20 Apr 2002 23:45:39 +0200
- To: www-zig@w3.org
Hi, This may have been discussed both in the plenary and in various subgroups, but if it has been proposed formally, I have missed it (if so, apologies in advance). Anyway... I would like to propose that the ZIG decides upon a convention for modelling the potential set of searchable access points (within an application domain or profile) using XPATH Path Expressions. An example could be a domain-specific attribute set which defines searchable access points as [a subset of] all possible Path expressions that identify data elements within a given schema. EXAMPLE: Given a database record like this: <book> <title>The catcher in the Rye</title> <author>J.D. Salinger</author> <subject vocabulary="LCSH">Fiction</subject> </book> One might like to pose a query like: Find the word "catcher" in the field matching "title". or: Find the word "fiction" in fields matching "subject[@vocabulary='LCSH']" Technically, the Path Expression would have to go into the "Complex" branch of the attributeValue, as a single string value, thus requiring support for version 3 of the protocol. I suggest that a similar mechanism be defined for SRW/U if it is not already in place. But do note that in the first round, there's no requirement to go beyond the current definition of the Type-1 query. RATIONALE: Outside of the library domain, Z39.50 is sometimes employed to support the networked IR requirements of different information domains. While in some cases, interoperability with libraries are a specific requirement, this is not always the case. Further, in more cases than not, the native, shared data models are already expressed in terms of XML. I suggest that in many cases where people consider Z39.50 for their application, it is an inhibiting factor that people feel forced to munge their existing data model into one that is suitable to Z39.50.. either by mapping elements (more or less) to Bib-1 attributes, or by declaring new, flat sets of numerical USE attributes... in either case, a process which introduces needless complexity in the documentation and maintenance of the domain-specific profile. The attribute set architecture already provides a mechanism for expressing searches for hierarchically nested elements which can be seen as a subset of XPATH. However, its primary drawbacks are that the mechanism is comparatively unfamiliar to people versed in XML techniques, and it is less expressive than XPATH (for instance, constraints on attribute values, as above, are probably not uncommon, yet have no clear parallel in the AA or XD-1). Of course there's nothing stopping a profile author from defining abstract access points which don't correspond directly to individual fields (such as an "ANY" attribute, or database record metadata). Similarly, there's nothing stopping a profile from defining a crosswalk to existing, conventionel attribute sets and requiring support for these. We have already embraced XML as a bona fide record syntax (eg. in the Bath profile). I think the next logical step is to also allow searches to be expressed in terms comfortable to the wider community -- without sacrificing the power of the Type-1 query. WHY BOTHER? 1) Because, by softening some of the library-traditional ways of using Z39.50 and showing how it can be used easily without bending your existing data model out of shape, we can help break down the misconception that it's not XML-friendly. Will it "sell" better? It still won't win the world, but you can't miss the fact that the W3C *still* doesn't have a suitable IR protocol. Maybe the slot is still open. 2) Because it's a natural progression from the move to support XML as a retrieval syntax and even from the thinking behind SRW. 3) Because it is happening anyway. I believe there's several ZIG members who make search engines which are natively oriented around XML-like data models, and who market their products to a broader range of products. Speaking for ourselves, we *need* to do this, or else abandon Z39.50 completely, and I suspect others may be in the same position. Why not do it in an interoperable way? FUTURE STEPS? It'd be useful to consider extending the Type-1 query to allow the use of an URI instead of an OID to denote the "attribute set". You will most likely want to mix and match the two type of identifiers -- there's no clear reason to abandon the Utility attribute set for the non-USE attributes. Cheers, --Sebastian -- Sebastian Hammer, Index Data <http://www.indexdata.dk/> Ph: +45 3341 0100, Fax: +45 3341 0101
Received on Saturday, 20 April 2002 17:44:58 UTC