- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Mon, 22 Apr 2002 11:42:54 +1000
- To: Liam Quin <liam@w3.org>, www-zig@w3.org
On Sat, Apr 20, 2002 at 11:27:15PM +0100, Robert Sanderson wrote: > Yep. We'd implement it for sure. On the other hand, the practical > advantages of it aren't as high as you might expect. Unless you know what > the data is like, you can't really send a sensible XPATH search. > If without prior knowledge of the database, you can't send a useful XPATH > search, then you might as well just configure enumerated access points. > (Which is what we do now, mapped to XPATH (almost) in the configfile) We have thought about how to include XPATH queries a number of times. Here is our thoughts on it (which overlaps with what other people have posted). We actually allow more than one lump of XML in a single record (You can have multiple GRS-1 fields in one record holding XML). So one approach proposed internally was to allow *within* an attribute, an XPATH expression to be specified. That is, have a special query term format (EXTERNAL is in there after all as a query term) that searched an XPATH expression within a single attribute. That way separate attributes can be bound to the separate XML fields. But I agree with the problems raised by others. XPATH has a quite different model to querying than Z39.50. You can do joins. You can specify the top node for a query which is different from the root node of the parse tree (so one document may be split into lots of little fragments). One work around was to say 'XPATH is only used to determine if the record matches'. This kept it within the Z39.50 model. But the real problem we had was a conceptual one. As others have said also, Z39.50 abstracts the query model from the physical representation. To me this is fundamental to the protocol. Its the differentiating factor. As soon as you tie queries to the physical data format, its a big step away from this abstraction model. But there is a way to maintain the current model. This is for there to be an abstract XML structure that is defined as a part of the abstract record structure. This XML structure does not have to be the same as the physical representation of the data. (In practical terms, you can think of an XSLT stylesheet converting the underlying physical representation to the publically agreed to logical representation for querying on.) Queries are expressed on the logical XML structure. If it matches, the underlying record is returned. This keeps the current Z39.50 separation intact. But we have not got around to implementing it yet. I have not looked for a while, but there were no standard text searching operators (proximity, stemming, etc) in XPATH. Some people have gone off and defined their own. Z39.50 has lots of features here. But there does not seem to be any good way to fit the Z39.50 text query features into XPATH at the lowest level. Another icky thing about XPATH in a way is that there is so much you can do with XPATH, its hard to enumerate all the queries possible. By this I mean it has all sorts of join capabilties. Its easy to write a query that an index cannot support. Do you then say such queries are not allowed? Or require that a server do a brute force false match check? Or define a subset of XPATH for use with Z39.50? Seems a lot of potential for interoperability woes with people implementing different subsets (whatever is easy for them). Alan
Received on Sunday, 21 April 2002 21:43:46 UTC