RE: Resource discovery: limits of URIs from Mitch Denny on 2001-12-18 (www-tag@w3.org from December 2001)

From: Mitch Denny <mitch.denny@warbyte.com>
Date: Tue, 18 Dec 2001 21:06:23 +1100
To: <www-tag@w3.org>
Message-ID: <014f01c187ab$a5e94490$f500a8c0@mel.ibk.com.au>
Peter/all,

I can only sympathise with the problems that you face
with Internet-based resources disappearing or moving. The
source of the problem I'm afraid is probably the exact
same thing that makes the Internet such a valuable tool.

Its fair to say that those driving the evolution of the
all the supporting technology have alot of vision, but
unfortunately design prowess is really only displayed
within working groups with narrow focus.

Don't get me wrong, this approach has been largely
successful, in terms of human achievement I think the
Internet and related technologies rate right up there.

Today we have some fantastic infrastructure with only
a few minor problems. The problem that you have expressed
crosses a few technology boundaries and I understand
that this group exists to solve just such a problem.

I'd like to see further exploration of your UCI idea,
what are the problems it would solve? One road block
that I see is that Internet hosts are definately
autonomous entities so any prescriptive solution
which rigidly defining a static structure would
almost certainly not be accepted.

As an alternative, how about a relitively simple
extension to the HTTP protocol via a header which
defines the volatility of the requested URI.

	Volatility: Dynamic 2592000

The above would suggest to the user-agent that the
page is dynamic and could change on each request
but the URI used to request it is valid for thirty
days (in seconds).

The user-agent could analyse that header and when a
bookmark is requested it could determine whether it
needs to pull it down and archive it. Ofcourse this
behaviour would be based on user preferences.

From my perspective there are several key
benefits to this approach:

	- Doesn't break existing user-agent and
	  server implementations.

	- Can be implemented on the server-side now
	  by systems that support dynamic content
	  like JSP, PHP, ASP, ASP.NET etc.

	- Gets around the problem of host independent
	  resource description today by encouraging
	  an "archive it if you need it" mindset.

There are also a few issues that would need
to be solved:

	- User-agents would need to be updated to
	  support this extension.

	- This works for HTTP, similar mechanisms
	  would need to be built for other content
	  delivery protocols.

	- Static content would need to have the support
	  of the server to correctly flag it as volatile
	  content or not - more management overhead.

	- Archiving content from sites might introduce
	  some legal issues - but they were always there.

From a users perspective, I think this could become
quite an intuitive process - today I do something
similar, for example when I purchase something online
I archive the page because I know that if I bookmark
the site its not going to be there when I go back.

Anyway, thats my proposal, I'm not even sure if this
is the right place to bring it up, but its out there
now, feel free to critique it constructively.

----------------------------------------
- Mitch Denny
- mitch.denny@warbyte.com
- +61 (414) 610-141
-
  
-----Original Message-----
From: www-tag-request@w3.org [mailto:www-tag-request@w3.org] On Behalf
Of Peter Pappamikail
Sent: Monday, 17 December 2001 9:23 AM
To: www-tag@w3.org
Subject: Resource discovery: limits of URIs


I'm flagging up an issue as a private individual rather than in my
official capacity of Head of Information Resources Management in the
European Parliament, although the issue I address has been considered in
my professional work as well as my private research and work on XML
implementation issues.

My concern is the mechanisms available to "translate" information on a
uniquely identifiable artefact to an addressable URI. Please accept my
apologies in advance if the issue is not appropriate for this list.

As an "ordinary user" I can "identify" or name a particular information
artefact, a book, a document, etc. With a URL, I can address it. The URL
will give me an address that usually combines details of an originating
authority, a content identifier, sometimes a language version and an
application format (MIME extension).

However, with the exception of the language version - that might,
depending on the server infrastructure, serve up a version according to
my indicated preferences set in the browser - the "discovery" of the
full URL cannot be deducted algorithmically from the content identifier.
A couple of examples to demonstrate my concern more clearly:

- "bookmark rot": I mark a set of resources from a particular site, only
to find a year later that all the references are rotten as the .htm
extension has been replaced by .php throughout the site, although no
single item of content has changed;
- I reference an item found via a WAP service, knowing that a more
complete version of the same content is available in HTML on a parallel
web site: the 'URLs' however are completely different despite referring
to the same artefact;
- I copy a URL in a site, only to discover that the the URL is
attributed not only dynamically but is ession specific and sometimes
personalised, and thus un re-useable;
- I'm listening to a voice synthesised web page that contains links to
resources thatare available in audio and text, but the link takes me to
the text file via the hypertext link;

In architectural terms, my concern is that more and more sites, in the
absence of any clear mechanisms for resolving addresses from
identifiers, have increasingly complex interfaces with proprietary
resolution mechanisms than practically render resources discovery
impossible, except indirectly. A user should be able to indicate the
minimum information that distinguishes a particular artefact uniquely
(I'm not sure the URN does this, because it is still only a URI with a
commitment to persistence) and not be bothered with whether it is the
most recent version, which languages are available, whether it is in
pdf, html, xml,wml, but that the server will resolve this in a
context-sensitive manner. The issue will become critical when XPointer
starts to be used to identify resource fragments: in fact the XPointer's
potential weakness is precisely that the containing document may itself
be poorly addressable.

My "ideal scenario" would be the replacement, in the hyperlink target
data, of an URI - pointing as it does to a specific file - by a "UCI" (
a "Uniform Content Identifier") that resolves to the specific
components:
- a DNS entry or other service locator;
- on the server side, to an URI appropriate to the client context, made
up of the content identifier 'wrapped' with language, version, format
and other context specific data;

If this sort of issue is handled elsewhere, I'd be happy to be pointed
the way, but I feel the issue goes beyond the scope of current W3C
activity on addressing and is too "instance specific" to be in the realm
of RDF or other semantic resource discovery issues: I believe the issue
is analoguous to HTTP language negotiation, and warrants similar
treatment.

Peter
Received on Tuesday, 18 December 2001 05:03:41 UTC