Re: Attribute architecture and nested attributes (path queries?) from Sebastian Hammer on 2002-09-11 (www-zig@w3.org from September 2002)

From: Sebastian Hammer <quinn@indexdata.dk>
Date: Wed, 11 Sep 2002 16:06:14 +0200
To: Alan Kent <ajk@mds.rmit.edu.au>, ZIG <www-zig@w3.org>
Message-Id: <4.2.0.58.20020911154144.03177e38@bagel.indexdata.dk>
Hiya,

We took a long look at nested attributes as a way to add more expressive, 
direct way to search structured data such as data models that are 
abstractly viewed as being "XML-like" rather than "GRS-1" like.

Our conclusion so far has been that nested attributes are both too clunky 
and not powerful enough. If you consider them as a mechanism, say, to 
increase the appeal of Z39.50 (or more likely, SRW) to XML-oriented 
communities, the obvious question may be, "what's wrong with Xpath?".

By introducing XML as a record syntax (rather than telling everybody to 
munge their XML into GRS-1), we began what I think is a healthy path 
towards adopting popular mechanisms from other communities rather than 
telling everybody that it's our way or the highway. Put that another way, 
the adoption of XML only makes sense (to me) if we see it as parrt of a 
greater move where we look to increase the power (and market appeal) of 
Z39.50 by bringing in the best of the XML family of languages where they 
have something to offer.

We'd like to propose an alternative model to nested attributes in which 
profiled subsets of the Xpath expression language is used to identify 
"parts" of abstract records for searching.

The benefit of this is partly that it allows people to deploy Z39.50/SRW 
without squeezing their data models into flat lists of USE attributes; 
partly that it gives even us old-timers a more powerful language. For 
instance, a search for Library of congress subject headings might be 
represented as:

Access point = /bibliographic/subject[@scheme='LCSH'], term = "computer 
science"

and so forth. Open issues would be how you address this in the search with 
an atribute set identifier. You could imagine per-schema OIDs allocated by 
communities who need this type of communication, or a single core OID 
identifying the practice. The ideal would be an extension of the Type-1 
query which allowed us to identify an XML schema (or namespace?).

I would much rather see a scheme like this introduced than a primitive 
mapping of an Xpath subset into nested attributes. The richness/complexity 
of Xpath doesn't have to be a factor in any given application because 
profiles will be free to, well, profile a specific, fixed subset of 
expressions (they then become equivalent to flat lists of numerical 
attrbutes, only in a somewhat more intuitive space than flat lists of 
integers).

We have the XML records. Let's not butcher the XML model by trying to 
squeeze it through a hole which isn't really big enough and which we 
haven't really formed a tradition for using anyway.

We have server code which supports the practice described here, so if 
anyone would like to try interoperability, give me a holler or download Zebra.

--Sebastian

At 12:45 11-09-2002 +1000, Alan Kent wrote:

>I had a question regarding nested attributes in the new attribute
>architecture. I was trying to work out the maximal power they can
>deliver.
>
>Rather than use numeric values, I will use XPath like syntax with
>element names. (Values can be strings after all!)
>
>My understanding is nested attributes will allow me to do queries
>such as
>
>     Access Point: /head/title
>     Term: Lessons in Life
>
>The attribute list for the access point would list two values ('head'
>followed by 'title') for the one attribute type ("1" for Access Point).
>I can also do wild paths allowing
>
>     Access Point: //title
>     Term: Lessons in Life
>
>That is, search in *any* access point where the last attribute value is
>title.
>
>What does the following mean?
>
>     Access Point: //title
>     Format/Structure: All these words
>     Term: Lessons Life
>
>Do the two search terms have to appear under the same 'title' or can they
>appear in different 'title' attributes? (If 'Lessons' appears under
>/head/title and 'Life' appears under /body/title in the same record,
>should the record match?)
>
>Then, pushing things a bit further, can I say under the same 'author'
>access point the 'firstname' access point must equal "John" and the
>'lastname' access point must equal "Smith"?
>
>The following query does not mandiate first name and last name be for
>the same author in the record (if there are multiple authors)
>
>         /author/firstname = John
>     AND
>         /author/lastname = Smith
>
>You need something like a PROX operator with an attribute list:
>
>         /author/firstname=John
>     Within-the-same /author
>         /author/lastname=Smith
>
>Maybe hijack the 'private' choice of 'proximityUnitCode' of the proximity
>operator to specify the leading path length that has to be the same...
>Ok, pretty yucky. The current KnownProximityUnits are actually not
>very useful as I really want to specify proximity with respect to
>the same attribute lists being specified in the query. How about
>a third CHOICE under proximityUnitCode being an AttributeList?
>
>Just wondering since nested attributes were put in how far someone had
>thought them through. Getting a simple path indexing scheme into Z39.50
>would certainly be a nice extension! And much more feasible to implement
>efficently than full XPath or XML Query etc.
>
>Alan
>--
>Alan Kent (mailto:Alan.Kent@teratext.com.au, http://www.mds.rmit.edu.au/~ajk/)
>Project: TeraText Technical Director (http://teratext.com.au) InQuirion 
>Pty Ltd
>Postal: Multimedia Database Systems, RMIT, GPO Box 2476V, Melbourne 3001.
>Where: RMIT MDS, Bld 91, Level 3, 110 Victoria St, Carlton 3053, VIC 
>Australia.
>Phone: +61 3 9925 4114  Reception: +61 3 9925 4099  Fax: +61 3 9925 4098

--
Sebastian Hammer, Index Data <http://www.indexdata.dk/>
Ph: +45 3341 0100, Fax: +45 3341 0101
Received on Wednesday, 11 September 2002 10:05:00 UTC