Re: URIs / URLs from Lee Jonas on 2001-04-06 (www-rdf-interest@w3.org from April 2001)

From: Lee Jonas <ljonas@acm.org>
Date: Fri, 6 Apr 2001 21:44:06 +0100
To: <www-rdf-interest@w3.org>
Message-ID: <PJEHIGCEAMAHEAPOOFBEIEHGCBAA.ljonas@acm.org>
Yet the task of keeping your own links current is a real problem,
particularly for sites with large numbers of them such as search engines.
URLs are not without their problems.

I dread to think how many man hours and CPU cycles have been spent around
the globe ensuring links to other pages still work.  Additionally, the web
is always 'current'.  There is no formal concept of version, the W3C and
most other self-respecting organisations implement the necessary measures
themselves (for representing Working Draft versions) in an adhoc fashion.

The notion of URLs identifying representations seems a little trite to me.
It indicates the nature of the true problem, without fully addressing it: a
resource at the end of a location is not consistent over time.

This is for at least two good reasons: resources evolve, and resources move
/ disappear / or worse, a second resource ousts the first at a particular
location.

The first issue could have been addressed more formally (and hence
consistently) with a simple versioning scheme.  This would have alleviated
the problem of instantly breaking third party links (or invalidating
metadata semantics) when you change a resource.  Yes your links must change
to reflect new versions of things you reference, but these changes could be
a graceful migration, not an abrupt crash.

The second is the main bugbear of using a resource's location to identify
it.  This phenomenon is well known in distributed object technology.
Superior solutions leave the actual resolution of an object's location to
some distributed service when a client wants to interact with it.

These are compounded with the fact that the resource can be one of many
formats and there is no clear way to distinguish them from the URL iself.  A
resource such as http://mydomain/mypic.png may safely be assumed to be a png
graphic, but what about the resource at the end of http://mydomain/mydir/ ?
Mime types have become pervasive for identifying a resource's type, yet URLs
predate MIME by years.  If you want to know its type you have to make a
request to some server process.

It may be that URLs have been sufficient so far because the Internet is
predominently for human consumption, and identifying abstract resources that
cannot be retrieved electronically has been pretty pointless to date.
However, aspects of the semantic web vision (plus other technologies like
WebServices, i.e. .Net?) could add a couple of new issues to the pot:

1) It may become more common to reason about abstract resources whose
identifiers may not be readily representable as a location.  It would be
better to identify these with a URN.  Hence URNs may be more widely used
than at present.

2) Encouraging reuse of RDF schema resources is antagonised by the need for
schema owners to update them.  Currently the proposed RDFS solution is very
MS COMesque - every schema update is a new resource that SHOULD be constant
and available in perpituity.  IMHO this approach is inferior to specifying a
common versioning scheme, which when coupled with a standard change
management process, allows related schemas to migrate and evolve in tandem.
After all, that's exactly what happens with dependencies between software
components now.

3) Data quality will be poorer if it is hard for software to detect a
resource change.  Transience is bad news if you are going to store facts
about something that subsequently changes.

What the solution to all this is I don't know.  I just can't help feeling
that as the semantic web progresses things are about to get a lot more
complicated unless these issues are addressed.


Regards

Lee



Dan Connolly <connolly@w3.org> wrote:

Pierre-Antoine CHAMPIN wrote:
[...]
> HTML and PDF version available at
>   http://www710.univ-lyon1.fr/~champin/urls/

This document promulgates a number of myths about
Web Architecture...

  "However, other W3C recommendations use URLs to identify
  namespaces [6,8,11]. Those URLs do not locate the
  corresponding namespaces (which are abstract things and
  hence not locatable with the HTTP protocol), "

I don't see any justification for the claim
that namespaces are disjoint from HTTP resources.
One of the primary motivations for the
XML namespaces recommendation is to make the
Web self-describing: each document carries the
identifiers of the vocabularies/namespaces it's written in;
the system works best when you can use such
an identifier to GET a specification/description
of the vocabulary.

  "For example, the URI of the type property of RDF is
  http://www.w3.org/1999/02/22-rdf-syntax-ns#type. As a
  matter of fact, the property itself is not located by that
  URL: its description is. "

Again, I see no justification for the claim that this
identifier doesn't identify a property.

  "URLs are transient

  That means they may become invalid after a certain period
  of time [9]. "

That's a fact of life in a distributed system. URNs may
become invalid after a period of time too. It's true
of all URIs. URIs mean what we all agree that they mean.
Agreement is facilitated by a lookup service like HTTP.
In practice, URIs are quite reliable:
6% linkrot according to http://www.useit.com/alertbox/980614.html
and I think the numbers get better when you measure
per-request rather than per-link,
since popular pages are maintained
more activly than average.

Unless urn: URIs provide value
that http: URIs do not, the won't be deployed.
I think the fact that

 (a) urn:'s have been standardized
 (IETF Proposed Standard, that is) since 1995
 (b) support for them is available in popular browsers
 and has been for several generations
and yet
 (c) still their use is negligible

speaks for itself. They don't provide any value.
Naming is a social contract, and the http: contract
works and the urn: contract doesn't.


  "In the immediate interpretation, a URL identifies the
  resource retrieved through it."

to be precise: it identifies the resource accessed
thru it. In the general case, you can't retrieve
a resource, but only a representation of one.
Other text that makes this error includes:

  "... the retrieved resource ..."

Another falsehood:

  "Contrarily to URLs, URNs (Uniform Resource Names) are
  designed to persistently identify a given resource."

URIs in general are designed to persistently identifiy
a given resource. Especially HTTP URIs.


I recommend a series of articles by the designer
of URIs, HTTP, and HTML to clarify a number of
these myths:

  World Wide Web Design Issues
  http://www.w3.org/DesignIssues/

esp

  The Web Model: Information hiding and URI syntax (Jan 98)
  http://www.w3.org/DesignIssues/Model

  The Myth of Names and Addresses
  http://www.w3.org/DesignIssues/NameMyth

  Persistent Domains- an idea for persistence of URIs(2000/10)
  http://www.w3.org/DesignIssues/PersistentDomains

  (Hmm... this one is an interesting idea, but I think freenet:
   might be easier to deploy.)

and regarding the intent of the design of namespaces, see:

  cf Web Architecture: Extensible Languages
  W3C Note 10 Feb 1998
  http://www.w3.org/TR/NOTE-webarch-extlang
Received on Friday, 6 April 2001 16:42:10 UTC