Re: Proposal: New Anchor attributes

Daniel W. Connolly (connolly@w3.org)
Tue, 28 May 1996 23:46:47 -0400


Message-Id: <199605290346.XAA23152@anansi.w3.org>
To: marc@pele.ckm.ucsf.edu (Marc Salomon)
Cc: www-html@w3.org
Subject: Re: Proposal: New Anchor attributes 
In-Reply-To: Your message of "Fri, 24 May 1996 08:46:57 PDT."
             <199605241546.IAA24083@pele.ckm.ucsf.edu.UCSF-LIBRARY> 
Date: Tue, 28 May 1996 23:46:47 -0400
From: "Daniel W. Connolly" <connolly@w3.org>


I have long thought that the ability to specify multiple addresses
for the target of a link is a Good Idea.

In message <199605241546.IAA24083@pele.ckm.ucsf.edu.UCSF-LIBRARY>, Marc Salomon
 writes:
>2.  SGML doesn't allow multiple occurrences of the same attribute in an
>element.  I.e., you can't do <A HREF="1" HREF="2">.

True.

>4.  Overloading ALT is problematic.  I had suggested that a convention of
>space-separated URI's in an HREF (or SRC) like: <A HREF="1 2 3"> eenie meenie
>minie </A> might work, but only for short URI's.  This doesn't break current 
>implementations (they retrieve URI 1), but can look ugly in the URI box.

That appeals to me as a good idea in some ways, but I wonder if
it's consistent with the existing specs. i.e. you might test with
one or two browsers and find that your idea doesn't cause too much
trouble. But somebody might have coded to the letter of the spec
and get burned by this. We don't want that.

In fact, I brought this up at a lunch discussion at the Danvers
IETF with TimBL and some other folks at the table. Tim shot this
sort of thing down as a hack, and suggested that the Right Thing
is to use additional typed links to represent replica information.

What we want is a syntax that captures:

	"The resource at X is also available at Y"

which is a special case of the general link semantics:

	"The resource at X has a link of type T to Y"

which is usually written (inside the content of X):

	<a rel=T href=Y>...</a>
or
	<link rel=T href=Y>

In order to be able to give information about many links,
including links where neither the source nor the target is
the containing document, TimBL has suggested the <resource> element.
See:

	http://www.w3.org/pub/WWW/MarkUp/Resource/Specification


So consider the typical software release announcement. Let's use
the link type "replica" as in:

	"The resource at X is a replica of the resource at Y"

If the software is canonically at:

	http://www.foo.com/foo.tar.gz
and mirrored at:
	http://www.foo.com/foo.tar.gz
	http://www.bar.com/mirror/foo.tar.gz
	http://www.baz.com/net-stuff/foo1.9.tar.gz

then the release announcement might look like:

	<p>We're proud to release
	<a href="http://www.foo.com/foo.tar.gz">CluesForYou v1.9</a>,
	with a host of clues for all occasions.

	<p>We've arranged for the following mirrors:
	<ul>
	<li><a href="http://www.foo.com/foo.tar.gz">austraila</a>
	<li><a href="http://www.bar.com/mirror/foo.tar.gz">africa</a>
	<li><a href="http://www.baz.com/net-stuff/foo1.9.tar.gz">US</a>
	</ul>

	<resource href="http://www.foo.com/foo.tar.gz">
		<meta name="content-length" content="1235234">
		<meta name="content-md5" content="23l4kj23l4kj23">
		<link rel=replica href="http://www.foo.com/foo.tar.gz">
		<link rel=replica href="http://www.bar.com/mirror/foo.tar.gz">
		<link rel=replica href="http://www.baz.com/net-stuff/foo1.9.tar.gz">
	</resource>


So a <resource>-savvy browser, when requested to follow the link
to http://www.foo.com/foo.tar.gz would have enough information
to try the replicas automatically, if www.foo.com didn't seem
to be responding.

I haven't considered markup to represent heuristics about timeouts.
And about that: even if the browser were given a timeout, I wouldn't
expect it to give up on one request when starting another: it should
only give up on one request whan _completing_ another (or at least
getting some indication that another request is likely to succeed).

The algorithm for seleting a replica -- and for fail-over -- needn't
be the same for all browsers. It should be network-friendly (i.e.
don't always start all TCP connections at once). Something like:

	* select one at random (give the canonical URL a higher
		probability than the others, perhaps)
	* start trying a TCP connection to that server
	* if after 4 seconds, there is no response, choose
		another and start that TCP connection
	* wait for one of the connections to get established
	* ...

Note the bits about content-length and content-MD5. That allows
a client to test if the replica is up-to-date w.r.t. the announcement.
The browser might download the stuff and check the length and MD5.
On finding a discrepancy, it might say

	"Hey! The checksum doesn't match! Want me to try another site,
		or do you think maybe the data is OK as is?"


Note that this replica link type is also useful for eliminating
duplicates in search-engines.

Suppose a document is available at /foo/ and at /foo/index.html
It could contain markup ala:

	<link rev=replica href="/foo/">

i.e. "the address you used to get this document is a replica
	of /foo/"

Typed links are cool.

Dan