- From: Alan Kent <ajk@mds.rmit.edu.au>
- Date: Fri, 2 Apr 2004 17:31:36 +1000
- To: Kevin Gamiel <kgamiel@cnidr.org>
- Cc: ZIG <www-zig@w3.org>
On Fri, Apr 02, 2004 at 01:06:44AM -0500, Kevin Gamiel wrote: > >One of the problems in implementing a Z39.50 distributed search server > >is a search request has to return the exact number of hits in the > >search response packet. The exact number of records in the final > > Says who? Rule number one: server choice. Rule number two: profiles. > Seems to me, the mechanics for doing what you want are all in place. I am not 100% sure what you mean by the above sorry. Do you mean that you think Z39.50 has enough flexibility to do this functionality without change? I do not see any logical way to do it with the current protocol. Or are you instead saying there are some simple ways of defining another EXTERNAL and dropping it into the existing protocol (at several possible points) and then writing a profile that describes the EXTERNAL and when to use it? (The latter I agree with.) > We > did this with Isite, we had a "search engine" plugin that was really a > distributed Z39.50 client. I remember thinking about how to fold such > functionality into the standard model, but never made much progress. I > *think* concurrent operations were invented for just such a case, but I > could be wrong. Concurrent operations allow a client to send multiple requests without waiting for responses down one socket. There are other side effects too, related to what and when the server is allowed to send things. I would like to expose the distributed collection as a single database name that you search on (to keep clients easy). That is, have 'sub databases' or 'composite databases' (terminology from the Explain record describing a database). For example, a server might expose a single database name 'all-libraries-around-the-world'. > Otherwise, there are (at least) two remaining issues. > First, do you want existing clients to work with this model and second, > if you don't care about existing clients, what's the best way to expose > this type of functionality. If you are willing to wait for all servers to respond, nothing needs to change in the current protocol at all. The question is how to incrementally tell the user of the status of the search while it is still in progress. Existing clients by definition cannot not support displaying incremental progress of searches as there is no agreed to way of doing this in Z39.50. So, clients must change. Whatever was done would have to be a profile, probably some new ASN.1 EXTERNAL of some kind, and then both a server and client would have to be extended to support it. > Otherwise, at least at a superficial level, I think it involves using > existing PDUs plus profiles and we can all think of a thousand ways to > negotiate and implement it. Yes. Several schemes roll off the toungue easily. But I will try to avoid boring people with them (yet! ;-). The requirements are much more interesting at this stage. > Would it be useful? I think the answer is > clearly "yes". In the past, we usually just hang the search until > either all results came back or a timeout occurred and we truncated > results, etc (yuck). I agree, yuck. More than yuck actually, as all queries will take as long as the slowest server, and in a world-wide search, that can be SLOW! This is one of the reasons succesful distributed search applications I think have always been done by clients. > But, what would it take to do this correctly? If you view the world as > XML folks tend to, then everything is a tree and sending a query to a > node will potentally branch to n nodes, ad nausium. I had not thought of having a tree, but rather a flat list of 'database X expands to A, B, C, D etc'. Turning it into a tree would not be hard I guess as when 'A' replies, it can return details for P, Q, R which all get nested in the 'A' details. So an individual server does not need to understand the full tree - it just understands its immediate children and how to nest responses it gets back. Interesting though. > It requires dynamic feedback from each > node, discoverying the topology in realtime, possibly based on the query > itself. Then it becomes an old-school query routing problem, a whois++ > delegated query problem, which becomes a management problem, etc, etc. I was not thinking of dynamically changing what to search. I agree thats a hard problem - but I think its orthogonal to the problem I am describing of incremental search progress notification. What the server chooses to search is up to it. Hmmm. Resource reports, resource controls, extended services, other-info: I agree there is ample scope to add in something using an EXTERNAL. I think the next question then is whether it matters if the client has to poll for updates or whether the server should squirt async messages at the client when it discovers something new to tell the client. The answer to this question will determine which existing Z39.50 protocol features could be used to support a profile. I do think its important to be able to do present requests before the search has finished running. I think async messages sounds good, but I am not convinced its actually important - what is wrong with a client polling the server once per second? (Hmmm. Unless you have 1000 users of course!) Is this what you meant by concurrent operations perhaps? Client sends a Search request. Whenver the server thinks it appropriate, it sends a ResourceContol to the client with updated information. Eventually the server sends a Search response when all distributed searches have completed. The client however needs to be allowed to do a present request against the result set name before the search response comes back, which requires support for concurrent operations. Without concurrent operations, I suspect the client would have to poll for updates on search progress no matter what Z39.50 features was used. This seems a little CPU wasteful, but probably easier to implement in practice. I am thinking of things like ZOOM APIs. Having strict client-request/server-response pairs makes general purpose APIs much easier to implement. It also avoids mandating 'event model' style programming. Alan
Received on Friday, 2 April 2004 02:33:22 UTC