Re: Attribute architecture and nested attributes (path queries?) from Sebastian Hammer on 2002-09-13 (www-zig@w3.org from September 2002)

From: Sebastian Hammer <quinn@indexdata.dk>
Date: Fri, 13 Sep 2002 09:08:26 +0200
To: Alan Kent <ajk@mds.rmit.edu.au>, ZIG <www-zig@w3.org>
Message-Id: <4.2.0.58.20020913084954.030d13d8@bagel.indexdata.dk>
At 12:07 13-09-2002 +1000, Alan Kent wrote:

>Interesting. A few questions if I may to tease things out a bit.
>You say above there were open issues on how to put it into Z39.50,
>then you have something implemented - how did you put it into the
>protocol? (I was curious to the level of functionality you thought
>you would need.)

As you suggested yourself, we use a string-valued (complex) attribute. The 
big difference to the model hinted at in the attribute architecture is that 
nested attributes are not used -- all of the path information is held in a 
single string-valued attribute, and the syntax used to pick apart elements 
is an Xpath subset.

>For example, I could imagine using a 'string' attribute value which
>was an XPath expression (you can use string values using 'complex'
>attribute values). So you define a special set with a single type
>(1 = Access Point?) where the value for the type is the XPath expression.

Precisely. In our server we take the liberty of cheating a little to make 
testing more comfortable.. we have assigned an OID for the generic 
practice, in our own OID space, but if our server receives a string-valued 
attribute of type 1 it currently assumes that it's probably an 
XPath-expression (that's the kind of thing that will come back to bite you 
if the concept takes off, but that seems like a luxury-problem at the moment).

>But you mention another point above which is an XPath expression
>with namespaces requires scope information - namespace prefixes
>for use in the XPath expression itself. (I did not fully understand
>what you meant when talked about XML schemas - you don't need a
>schema to evaluate an XPath expression - or is that the logical
>schema you want to search - separating logical schemas from whatever
>the physical representation of the data is).

Your paranthesized guess is correct. It's true you don't need the scope 
information, and we currently don't use it.. I described it because it 
seems to me that if we want to retain the possibility of a split between 
the search and retrieval data models, then it would be useful to identify a 
(possibly abstract) set of elements at search time.

Our current implementation is based strictly on the contents of the 
internal record, though.

>Or is the idea that the 'XML' representation is just a semantic model
>mapped on to the real physical model (whatever it is, including MARC).
>That way, we can just say semantic models should not use namespaces.

I think so.

>I am not (yet!) completely convinced introducing XPath as a means
>of specifying access points into Z39.50 is a good thing. Most systems
>predefine indexes on particular access points etc. Having a completely
>dynamic scheme such as XPath for identifying access points may require
>a completely different indexing and query engine to existing systems.

It certainly makes *possible* a completely different indexing engine... our 
conclusion has ben that if we want our engine to be relevant in IR 
applications outside of libraries, then we'd better think hard about how to 
allow people to express access points in language they find natural.

But it doesn't *require* a different indexing engine, because just as 
profiles mandate exact attribute combinations today, so they may well 
mandate specific element patterns in liu of USE attributes, and simple 
servers can be implemented using string-matching against predefined 
patterns. The string patterns are more verbose than USE attributes, for 
sure, but you can make that argument against anything XML as a whole, and 
that's clearly not what people choose their information structuring 
framework by.

>Hence its unlikely any existing systems would move forward to use it.
>The alternative is to enumerate all the paths a system will support
>so you can use XPath expressions to identify a path, but its really
>just a fancier naming scheme (attributes instead of a number have
>a really long string - but the Z39.50 server just treats the string
>effectively as a name - if the XPath expression is in the list of
>supported expresions, great its supported.)

It's a fancier naming scheme that's directly intuitive and accesible to 
anyone with a background in XML. In that respect, it's an attempt to open 
up Z39.50 for use by these groups without altering the fundamental model of 
the protocol. I'd say the cost-of-migration to this practice could easily 
be less than that required for SRW adoption... but then, migration is not 
required -- the purpose of this is chiefly to make the protocol more 
appealing to new communities, and to make Z39.50-based IR systems more 
versatile.

>Another option (not debating merit yet) is to *encode* requests using
>existing numeric attribute schemes. Client applications may allow users
>to enter an XPath like syntax. A new Explain category could always be
>introduced containng the information for how to convert XPath expressions
>into attribute lists etc. The main benefit is that its not that radical a
>change to Z39.50 as is. Another thing is any namespace stuff can be
>done by the client when mapping to attribute values. (The namespaces
>are almost more like OIDs identifiying the set they come out of.)

The problem is that you're superimposing a fairly complex mapping scheme on 
top of something which is very simple and intuitive to many people (XPath). 
I just don't think that will fly.

>So I guess the bottom line is how radical you are willing to get
>in terms of a model shift. Internally here, major changes are much
>less likely to get up.

I think the key may be to think about this as an incremental step towards 
adopting current industry standards in a way that doesn't have to be very 
intrusive to present systems, but which can potentially increase the value 
of Z39.50-based systems manifold by freeing them from certain conventions 
in Z39.50 that are (perhaps) even more obscure and unfriendly to outsiders 
than our use of BER.

>Interesting area though.

For sure. Btw., the current version of the PQN-decoder that comes with YAZ 
has experimental support for this. It'd be interesting to look at how it 
could fit into CCL-derived query languages as well... although my guess is 
that it's not complicated.

--Sebastian
--
Sebastian Hammer, Index Data <http://www.indexdata.dk/>
Ph: +45 3341 0100, Fax: +45 3341 0101
Received on Friday, 13 September 2002 03:07:11 UTC