communication between URC servers

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Mon, 3 Jul 1995 14:55:53 -0600


From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Message-Id: <9507031455.ZM17669@idaknow.acl.lanl.gov>
Date: Mon, 3 Jul 1995 14:55:53 -0600
To: uri@bunyip.com
Subject: communication between URC servers

Hello again,

The first draft of the SGML-based URC spec concentrated on how to
develop a URC service that conforms to the fundamental assumptions of
no universal attributes and no universal syntax. It discussed only the
communications between the client and the server, and left the
server-server stuff for a later version of that spec. The
server-server communications will actually be very important if the URC
service is to meet its goal of reducing the number of broken links in
the WWW. This message indicates my current thinking on server-server
communication in order to elicit comments. It is considerably shorter
than the draft URC spec, so I hope it will generate more comments.

Server-server communications fall into at least two classes. The first
is replicating the same URC service onto multiple machines. In order to
overcome the limited fault-tolerance of the current WWW, this
replication will be very important. However, it is not really
appropriate for us to specify how it shall occur. We wish to encourage
the development of a wide range of URC servers, with a wide range of
capabilites. Therefore we do not presume to specify how organizations
must store their URC data. Some small sites may wish to store URCs in
files. Larger sites may use commercial database technology. Replicating
the former can be handled by mirroring programs. Replicating the latter
will depend upon the particular databases used. We prefer to leave the
replication of identical information to the choice of the implementors.

There is a second class of server-server communication. This is
communication between servers that have different URCs for the same
resource. It is easily illustrated by considering the resource
migration problem. Assume that a publisher and the Library of Congress
(or other third party URC provider) both have URCs for a resource.
Assume that the resource is moved from one URL to another. Since this
movement is handled by the publisher, we assume they can easily change
the location information in their URC. However, how are the third-party
URC providers to know how (and when) to update their URCs? Until their
URC is updated, they are handing out a bad link and we are back to the
dead links problem with which we are all familiar.

There are several possible solutions, with the inevitable trade-offs.

First, the third-party providers can poll the publisher's URC server
regularly with a query of the form "show me all the URC changes since
my last visit". There are several disadvanage to this approach. First,
the third party providers may be providing bad information for a period
of time. This can be alleviated by a protocol where browsers, when
attempting to access a resource and failing, send notification of the
problem to the URC server that told them where to get the resource. The
third party URC resolver can then try to rectify the problem. A second
problem is that such queries are computationally intensive for the
publisher to handle, therefore they will probably not allow just anyone
to issue them.

A second solution is for the naming authority to have a list of other
URCs for the same resource. When the NA changes its URC, a message can
go out to all the URC servers on the list. In principle it would be
possible for these updates to occur automatically, although the amount
of trust to place in these update messages will depend upon the
strength of the cryptography employed and the business relations in
place between the two organizations. A disadvantage of this approach is
that third parties must register with the publisher, something they may
not wish to do. Second, if recent proposals for rating services come to
fruition, this list could actually get rather long.

A third solution would be a flooding approach to distributing changes,
ala NNTP. Yecchh! Relatively few sites are actually likey to care about
this information. A restricted multicast approach might be
investigated. I would appreciate comments by people on the feasibility
of this.
 

Note that nothing prevents these solutions from being deployed
simultaneously to serve different communities. I am very interested
in people's opinions on how this notification should proceed.


In addition to the issue of how notifications could be distributed,
there is also the issue of the format of the notifications, Here, the
URC's generality points us to an answer. A special attribute set can be
defined. As an example, an update message might look like:

<!DOCTYPE URC SYSTEM "urc:x-dns-2:uri.acl.lanl.gov:update-fmt.dtd">
<URC>
 <URN>whatever</>
 <INSERTIONS>
  <URL>new URL for resource goes here</>
 </INSERTIONS>
 <DELETIONS>
  <URL>old URL goes here</>
 </DELETIONS>
</URC>

Patterns can be matched to find out where to insert new material and
which old entries to delete.


Your comments please!

Regards,


-- 
Ron Daniel Jr.                email: rdaniel@acl.lanl.gov
Advanced Computing Lab        voice: (505) 665-0597
MS B-287  TA-3  Bldg. 2011      fax: (505) 665-4939
Los Alamos National Lab        http://www.acl.lanl.gov/~rdaniel/
Los Alamos, NM,  87545    tautology: "Conformity is very popular"