Re: Distributed searches in Z39.50? from Kevin Gamiel on 2004-04-02 (www-zig@w3.org from April 2004)

From: Kevin Gamiel <kgamiel@cnidr.org>
Date: Fri, 02 Apr 2004 01:06:44 -0500
To: Alan Kent <ajk@mds.rmit.edu.au>
Cc: ZIG <www-zig@w3.org>
Message-ID: <406D02F4.7010208@cnidr.org>

 > <warning>Blue sky idea following.</warning>

Hi Alan, thanks for the spark, this list is never boring:-)

> One of the problems in implementing a Z39.50 distributed search server
> is a search request has to return the exact number of hits in the
> search response packet.  The exact number of records in the final

Says who?  Rule number one: server choice.  Rule number two: profiles. 
Seems to me, the mechanics for doing what you want are all in place.  We 
did this with Isite, we had a "search engine" plugin that was really a 
distributed Z39.50 client.  I remember thinking about how to fold such 
functionality into the standard model, but never made much progress.  I 
*think* concurrent operations were invented for just such a case, but I 
could be wrong.  Otherwise, there are (at least) two remaining issues. 
First, do you want existing clients to work with this model and second, 
if you don't care about existing clients, what's the best way to expose 
this type of functionality.

I can't think of a clever server-side model that would fake a dumb 
client into understanding the increased complexity.  Again, maybe if you 
negotiated concurrent operations it would work.

Otherwise, at least at a superficial level, I think it involves using 
existing PDUs plus profiles and we can all think of a thousand ways to 
negotiate and implement it.  Would it be useful?  I think the answer is 
clearly "yes".  In the past, we usually just hang the search until 
either all results came back or a timeout occurred and we truncated 
results, etc (yuck).

But, what would it take to do this correctly?  If you view the world as 
XML folks tend to, then everything is a tree and sending a query to a 
node will potentally branch to n nodes, ad nausium.  As you mention, 
maybe this is ultimately an explain problem.  But, seems like in it's 
pure form, it's more than that.  It requires dynamic feedback from each 
node, discoverying the topology in realtime, possibly based on the query 
itself.  Then it becomes an old-school query routing problem, a whois++ 
delegated query problem, which becomes a management problem, etc, etc.

Welcome to hell  :-)

Kevin

Received on Friday, 2 April 2004 01:14:18 UTC