Re: Resource discovery: limits of URIs

----- Original Message -----
> From: Peter Pappamikail
> To: www-tag@w3.org
> Sent: Sunday, December 16, 2001 5:23 PM
> Subject: Resource discovery: limits of URIs


> I'm flagging up an issue as a private individual rather than in my
official capacity of Head of Information Resources Management in the
European Parliament, although the issue I address has been considered in my
professional work as well as my private research and work on XML
implementation issues.

>My concern is the mechanisms available to "translate" information on a
uniquely identifiable artefact to an addressable URI. Please accept my
apologies in advance if the issue is not appropriate for this list.

Your question makes an assumption that URIs can be divided into names and
addresses.   This is, I beleive, not a useful assumption.  In brief, the
differences between "names" and "addresses" are many and varied, and soial
rather than technical.   In longer form, see
http://www.w3.org/DesignIssues/NameMyth.html for my take on it. This is an
old issue.  Not everyone takes my position. There have indeed been many
systems designed for names, which almost invariably end up with the same
authority:local id system, and then (I simplify) a reconstruction of the
entire domain name system in order to be able to look them up.

>As an "ordinary user" I can "identify" or name a particular information
artefact, a book, a document, etc. With a URL, I can address it. The URL
will give me an address that usually combines details of an originating
authority, a content identifier, sometimes a language version and an
application format (MIME extension).

No, just an authority and a content identifier (your terms).  There is no
semantics to the right hand part, and you certianly should never assume
anything about the MIME type of something from its URI.
Other http://www.w3.org/DesignIssues/ notes deal with some of this.

> However, with the exception of the language version - that might,
depending on the server infrastructure, serve up a version according to my
indicated preferences set in the browser - the "discovery" of the full URL
cannot be deducted algorithmically from the content identifier. A couple of
examples to demonstrate my concern more clearly:

- "bookmark rot": I mark a set of resources from a particular site, only to
find a year later that all the references are rotten as the .htm extension
has been replaced by .php throughout the site, although no single item of
content has changed;

That is entirely the fault of the site mangement, and should be brought up
at their company's shareholder meeting IMHO.   See
http://www.w3.org/Provider/Style/URI  "Cool URIs Don's change".


> I reference an item found via a WAP service, knowing that a more complete
version of the same content is available in HTML on a parallel web site: the
'URLs' however are completely different despite referring to the same
artefact;

That is a problem with the WAP architecure.  It is not a problem with HTTP
which allows
you to distinguish between a Genric URI and a specific one.
(http://www.w3.org/DesignIssues/Generic)

>- I copy a URL in a site, only to discover that the the URL is attributed
not only dynamically but is ession specific and sometimes personalised, and
thus un re-useable;

That is also a server problem, unless there is real session-sepcfic
information (such as search results) which you may want to bookmark.
Cookies, when used instead, prevent this problem.

>- I'm listening to a voice synthesised web page that contains links to
resources thatare available in audio and text, but the link takes me to the
text file via the hypertext link;

The server is not doing format negotiation propely, or not trusting the
client to do so.
Some sites *do* get this right - natural langauges for example.

> In architectural terms, my concern is that more and more sites, in the
absence of any clear mechanisms for resolving addresses from identifiers,
have increasingly complex interfaces with proprietary resolution mechanisms
than practically render resources discovery impossible, except indirectly.

I agree that is a problem.  I would strongly promote content negoatiation.

> A user should be able to indicate the minimum information that
distinguishes a particular artefact uniquely (I'm not sure the URN does
this, because it is still only a URI with a commitment to persistence) and
not be bothered with whether it is the most recent version, which languages
are available, whether it is in pdf, html, xml,wml, but that the server will
resolve this in a context-sensitive manner. The issue will become critical
when XPointer starts to be used to identify resource fragments: in fact the
XPointer's potential weakness is precisely that the containing document may
itself be poorly addressable.

> My "ideal scenario" would be the replacement, in the hyperlink target
data, of an URI - pointing as it does to a specific file - by a "UCI" ( a
"Uniform Content Identifier") that resolves to the specific components:
- a DNS entry or other service locator;
- on the server side, to an URI appropriate to the client context, made up
of the content identifier 'wrapped' with language, version, format and other
context specific data;

The way this is done is that
1. You use an HTTP URI
2. You use content-negotiation
3. You use "Location:" field to point to the specific URI

You can of course embedd metadata in a resource to give more sophisticated
map
of the resources, gneric and otherwise, involved.

> If this sort of issue is handled elsewhere, I'd be happy to be pointed the
way, but I feel the issue goes beyond the scope of current W3C activity on
addressing and is too "instance specific" to be in the realm of RDF or other
semantic resource discovery issues: I believe the issue is analoguous to
HTTP language negotiation, and warrants similar treatment.

I hope you can indeed use content negotiation.  It migh be worth a look at
CC/PP for
more precise client profiling.

tim

> Peter

Received on Friday, 1 February 2002 17:24:30 UTC