RE: Distributed searches in Z39.50? from Alex Khokhlov on 2004-04-07 (www-zig@w3.org from April 2004)

From: Alex Khokhlov <alex@lib.msu.ru>
Date: Wed, 7 Apr 2004 17:33:22 +0400
To: "'Alan Kent'" <ajk@mds.rmit.edu.au>, <www-zig@w3.org>
Message-Id: <04040717222725400@mail.lib.msu.ru>
Dear Alan!

> Thanks. It was interesting. Just to confirm my understanding, you have
> developed a Z39.50 distributed search client with a web interface.
> That is, users access a web interface and then your program does a
> distributed Z39.50 search (using lots of threads etc).

Yes, that's correct.
 
> My curiosity was a little different - I was wondering how to expose
> a Z39.50 interface instead of a web interface. That is, develop a
> server that allowed Z39.50 clients to access it where the server
> forwarded requests on to all the remote servers, did the query
> translations, record normalization, etc.

Yes, I see your point. That task is a little bit different - read further, I
have some interesting information for you about this kind of problem.

> The problem was that I was not sure how to use Z39.50 and get progressive
> results returned to a client - the client does not want to wait for
> all responses as that would be too slow. The answer is that there
> is a way to do it in Z39.50 using resource reports and concurrent
> operations (but no client exists that uses the capability).

My question is: is it really worth extending and tweaking Z39.50 into some
kind of a metasearch protocol? It's too overcomplicated already even with
simple 'search' & 'present' operations, additional complexity would not do
any good.

I would suggest looking at other opportunities that are under development
and will be soon available for you to use. I hope that NISO MetaSearch
initiative will soon produce good specs and guidelines to follow
(http://www.niso.org/committees/MetaSearch-info.html). 
 
> Is there a benefit over just having the clients do the distributed
> search directly (like what you have done)? It is not clear to me
> that there is a benefit. If it was easy to do within the protocol,
> then client writers might implement something. If its tricky, I think
> client writers would never bother (or if they did bother, they would
> go to the effort you have and do the distribution in the client).

I don't agree with you here. Look at the history of Internet: are
HTTP/FTP/SMTP/POP3/HTML/XML and hundreds of other protocols complicated? No,
they are not. They are as simple as they ever could be - that's why they are
so widely spread and flourished for many years. Sophistication comes from
the area of protocol appliances, just as it should be, like
WWW/WebServices/E-Mail clients and so on...

> I was wondering if a simpler protocol approach that does not introduce
> concurrent operations into the mix (for clients - I don't care about
> servers) existed. I think a simpler approach may be necessary to
> result in a simple ZOOM API. As soon as you require multiple threads,
> async operations, etc, I think one of the goals of ZOOM (simple API
> for programmers to use) will start to disappear. But maybe there is
> a way to use concurrent operations etc under a ZOOM API without exposing
> that complexity to the programmer using the API.

Have a look at test metasearch implementation I've done for the purpose of
testing an idea of simple metasearching: http://www.sigla.ru/sru-test.jsp.
It allows you to query Sigla via extended SRU protocol and search in many
catalogs. The available results are returned immediately with an x-finished
element at the end. If it's not set to 'true', then you should query Sigla
once again to get an updated view of the distributed search (don't worry -
connection pooling makes sure that old connections are reused for that
purpose). Then you can take one of the non-empty catalog results and type in
the x-collection value into the form and set a 'searchRetrieve' operation -
click 'search' and you will get records in MarcXML.

As far as I understood from your letter - this is exactly the thing you are
looking for, but it's not standard - just a test of an idea. Maybe it will
somehow be standardized in the future.

> But maybe its always going to be tricky - clients have to progressively
> display results and let users view them. The protocol aspect is only
> one part of the complete problem that the client writer has to address.

Well, sooner or later you'll have to deal with the nature of the distributed
search. :) But in the solution I described in the previous paragraph you can
just reformat current XML result into some HTML file and don't bother about
progressive display. If you want to update the status of search - just click
refresh (of course it's not too efficient, but it's simple and it will
work).

> Thanks to everyone else who replied too. Interesting stuff.

I'm also very interested in any protocol developments concerning
metasearching in many heterogeneous sources. It was very interesting for me
to read your replies, thanks for everyone!

BR, Alex Khokhlov.
Received on Wednesday, 7 April 2004 09:33:05 UTC