Re: Proposal: Searching XML

At 23:40 20-04-2002 +0100, Robert Sanderson wrote:

>Can you explain how one would say that the ComplexAttributeValue is to be
>treated as an XPATH as opposed to any other string?

I think the only reasonable answer at this point would be, "by the 
attribute set OID".

Assuming the present Type-1 query, you could imagine defining an attribute 
set based on [a subset of] the XPATH path expressions that vould be valid 
for, say, the authoritative Dublin Core representation in XML, or whatever 
other data format your're looking at.

We could also define a new attribute set OID, call it "XPATH-local", which 
is defined to correspond to whatever data model happens to be supported by 
the database. But that approach would offer no interoperability whatsoever 
and would not really be in the spirit of Z39.50 to begin with, so I think 
it would be a bad idea...

I do think the query should state what abstract model it presupposes, just 
as it does now. You may want to extend the query with a space for an URI, 
and maybe we should supplement the AttributeValue with a slot specifically 
for an XPATH.... but none of these are necessary to adopt the proposal 
right now.

> > model into one that is suitable to Z39.50.. either by mapping elements
> > (more or less) to Bib-1 attributes, or by declaring new, flat sets of
> > numerical USE attributes... in either case, a process which introduces
>
>This is a particular issue with document fragments. For example one might
>want to have records which comprise the text of an entire book. However
>retrieving the entire text at once is not useful.  Important functionality
>is to be able to retrieve matching document fragments, whilst maintaining
>the ability to search against the entire record.
>
>For example:
>
>Find pages matching 'Frodo' and ('Aragorn' or 'Strider') in books with the
>author 'Tolkien, J.R.R.'

Two things:

I'm assuming you're talking about varying the scope of the search, the 
"unit" of the search, in essence. I do think that's interesting.... of 
course I was the one who promised to set up a mailing list for the purpose 
two ZIGs back and then never did so... perhaps better late than never?

Second, I do think we need to be *incredibly* careful about messing with 
the boundary between search & retrieval. I'm not saying it shouldn't be 
done, just that it should be done with great deliberation.

> > It'd be useful to consider extending the Type-1 query to allow the use of
> > an URI instead of an OID to denote the "attribute set". You will most
> > likely want to mix and match the two type of identifiers -- there's no
> > clear reason to abandon the Utility attribute set for the non-USE 
> attributes.
>
>I think that the recent decision to use URIs for schemas in
>elementsetnames is a good first step towards this.

Yep. I'd turn it around and say the logical next step after the schema 
extension is to look at the query -- or maybe they should be done at the 
same time (part of the reason I'm bringing this up right now).

>Perhaps we ought to rethink 12 months as the next ZIG and put it back to
>6?

I was assuming that the decision to go to 12 months also meant that we 
should perhaps be able to make more far-reaching decisions by email... 
otherwise I think 12 months is almost a death sentence to the standard.

--Sebastian
--
Sebastian Hammer, Index Data <http://www.indexdata.dk/>
Ph: +45 3341 0100, Fax: +45 3341 0101

Received on Saturday, 20 April 2002 19:10:17 UTC