Distributed searches in Z39.50? from Alan Kent on 2004-04-01 (www-zig@w3.org from April 2004)

From: Alan Kent <ajk@mds.rmit.edu.au>
Date: Fri, 2 Apr 2004 09:44:16 +1000
To: ZIG <www-zig@w3.org>
Message-ID: <20040401234415.GA24242@io.mds.rmit.edu.au>
Hi all,

<warning>Blue sky idea following.</warning>


One of the problems in implementing a Z39.50 distributed search server
is a search request has to return the exact number of hits in the
search response packet.  The exact number of records in the final
result set has to be returned in the search response as there is no
other way to tell the client later about a revised figure.  This means
a server cannot respond until all of the distributed searches it sends
out return.  The current solution is to put more complexity into the
client software by letting it send the query to multiple servers and
manage the responses incrementally coming back.

One solution is to extend Z39.50 with an extra option bit to negotiate
the capability for results to be made progressively available.  The
idea is that if a server responded, for example, with a search status
of failure and result set status of in-progress (a new status not
currently defined), the server is continuing the search in the
background.  A new mechanism would also be made available to retrieve
details about existing result sets allowing at least the set status and
size to be retrieved.  This could be done, for example, as a new
Explain category where you can query for all sets or a set with a
specific name.  Or a completely new request/response type could be
introduced.  This would also allow clients to ask what sets it
currently had on a server.

An alternative is to use an Extended Service to submit such search
requests.  This gives control over canceling the search operation and
interrogating its current status.  When created, the extended search
request would contain a SearchRequest.  When fetched, the task package
would contain at least a SearchResponse.  (Reusing the current ASN.1
constructs would hopefully simplify changing search clients who already
have to construct and pull apart these structures.)  The client can
determine that the search has finished when the Task Status is
completed.

Yet another solution is to use say OtherInformation in a present
request allowing a 'revised result set size' to be returned when you
fetch records from an existing result set.

There are lots of unresolved issues that one can think about.  However,
to me the first question is Is such a facility going to be any use in
practice?  Maybe getting the client software to do the distribution is
the best solution.  It works today with no protocol change.  If a new
facility was made available, would any of the clients support it?  It
moves effort from the client writers to the server writers (which I
think is a good principle), but in practice are people still doing
active Z39.50 client software development?

My personal belief is that if Z39.50 was being designed again, it would
be a logical thing to have included (somehow).  But the reality today
is that some clients can do this now so there is no real benefit in
trying to do it in Z39.50.  Anyone have a different opinion?  Anyone
think there is anything more to this than a purely intellectual
exercise?


I actually started from the Explain database (we support Explain) and
how it may be useful to applications to see what result sets they
currently had on the server.  In some of our applications we have
multiple independent libraries of code firing off queries down a shared
connection.  Finding out all sets that existed would be a useful
facility for debugging or monitoring what was going on.  Implementing a
distributed search Z39.50 gateway also sounded fun to try, but would
work badly at present I believe because all remote servers need to
respond before the gateway can respond, and that would be too slow to
be acceptable to users.  Users want results shown progressively on
their screens like current clients can do.

Also note, there is no reason why the searches being forwarded by the
server need to be Z39.50 requests.  I am focussing here on the protocol
between a client and a distributed search server.  The same issues are
true for trying to put a Z39.50 to Google protocol converter in place.
The Google web interface changes its mind about how many records exist
in response to a search after you start fetching records.

Thanks
Alan
Received on Thursday, 1 April 2004 18:44:21 UTC