RE: URIs / URLs

Aaron Swartz [mailto:aswartz@swartzfam.com] wrote:

>Lee Jonas <lee.jonas@cakehouse.co.uk> wrote:
>
>> The point is that "retrieval" is not an endemic aspect of URLs.  URLs
merely
>> identify by location.  Software agents do something with that identifier
>> (which just happens to be retrieval of the identified resource, mostly).
>
>Yup! That's why partly URIs are "identifiers", I think.
>
>> In terms of resources changing, two examples you cited sprang out at me:
>> 1) Two people with the name Champin at the same university at different
>> points in time.
>> 2) Different versions of a W3C Working Draft.
>> 
>> It seems that 1) is clear cut: two different resources identified by the
>> same URL because they occupy the same location at different points in
time.
>
>Actually, if we start out being clear that's not true.
>
>http://example.edu/~champin/ could be said to represent the person named
>Champin currently enrolled at Example University. Thus, the resource would
>stay the same, although the entity (and more specific resource) it referred
>to would change.
>

From the RFC you quoted in another mail:

<q cite="http://www.ietf.org/rfc/rfc2396.txt">
   ... The
   term "Uniform Resource Locator" (URL) refers to the subset of URI
   that identify resources via a representation of their primary access
   mechanism (e.g., their network "location"), rather than identifying
   the resource by name or by some other attribute(s) of that resource.
</q>

My understanding (which could, of course, be wrong ;-) is that:

1) the resource could be regarded as Pierre-Antoine Champin, or his
conceptual Home Page, etc.  We have no way of knowing with URLs and
shouldn't need to care.  
2) a representation is the actual html doc (or whatever) accessible through
the http protocol with the (http specific) network location
'//example.edu/~champin/'.
3) the physical person Pierre-Antoine Champin, or his conceptual home page,
etc. is identified via the location of the representation,
'http://example.edu/~champin'.

(As an aside, identifying a resource that is not intended to be
electronically accessed, such as Pierre-Antoine, without having to contrive
redundant representations is where URNs are useful).

If Pierre-Antoine leaves example.edu and is replaced by Jean-Claude Champin,
who puts his home page in the same place at 'http://example.edu/~champin'
then not only has the representation (html doc) changed, but the same
location ('http://example.edu/~champin') now identifies Jean-Claude (or
Jean-Claude's conceptual home page, etc).

I.e. the same URI identifier has changed from identifying Pierre-Antoine to
identifying Jean-Claude (or repective home pages, etc).

Taking 'http://example.edu/~champin/' to identify "the person named Champin
currently enrolled at Example University" as you said above implies that
URLs do not identify these resources through representations, but identify
mappings-to-resources through representations instead.  This would have very
serious ramifications for RDF.

Regardless of the interpretation of Resource, doesn't it strike you as a
major blow to writing metadata statements about URLs that remain correct
over time?  Consider making assertions about Pierre-Antoine only to find 3
months later that those assertions are are actually making (probably false)
statements about Jean-Claude instead.

>> This is a problem in general in terms of the transient nature of the
>> Internet.
>
>But yes, changing the meaning of URIs is bad. However, not specifying the
>exact meaning of URIs is also bad (as the above example shows).
>

Assuming you intend 'the exact meaning' to be specified via RDF, then no URL
can be dealt with in a correct manner programatically unless you can
interpret the RDF first.  It elevates RDF to become a fundamental part of
the Internet - IMHO, it should be an optional adjunct to provide extra
richness.  Also, RDF is built on top of URLs (and URNs).  Insisting that use
of URLs relies on RDF could lead to cyclic dependencies.

I believe the "exact meaning" of URLs should be constant, simple and well
known.  Judging from discussions on RDF-IG so far, 1 out of 3 isn't so great
;-)

>> However, 2) is not so straight forward.  The W3C use URLs that identify
>> different versions of a document as different resources (by incorporating
>> the publish date).  Yet the latest version is also identified by a URL
that
>> does not contain any distinguishing date information.  The resource
>> retrieved by this URL changes to always retrieve the "latest version".
>
>This is very similar to the example above. Just as our hypothetical
homepage
>represented the current Champin, the W3C URL represents the current version
>of the document. You'll probably be interested in TimBL's writing on
Generic
>resources:
>
>http://www.w3.org/DesignIssues/Generic

This was sloppy terminology on my part.  It should have read:

The W3C use URLs that identify different versions of a document (resource)
as different representations in different network locations (by
incorporating the publish date).  Yet a further representation of the latest
document (resource) version is also identified by a URL describing a network
location that does not contain any distinguishing date information.  The
representation retrieved by this URL changes to always retrieve the "latest
version" of the document (resource).

E.g. a Working Draft is published on 02/02/1999, then a subsequent version
of the same resource is published on 04/12/1999.  The W3C would publish 3
representations at the following network locations,
http://www.w3.org/TR/1999/WD-xxxx-19990202,
http://www.w3.org/TR/1999/WD-xxxx-19991204, and http://www.w3.org/TR/xxxx.

Although both cases can lead to a representation at the same network
location changing (the latter in the W3C Working Draft example), I wanted to
draw a distinction because I believe that:

* case 1 is due to there being a totally different resource identified than
before.  This is very bad news for RDF specifically, and the Internet in
general and hence should be frowned upon.

* case 2 is due to the natural evolution of the same resource.  I went on in
further mails to wish that this specific case was catered for more formally
as part of the URI mechanism, allowing representations of the different
versions to exist in the same network location with access to specific
versions left to the protocol.  This would allow RDF to describe either
version-independent metadata or version-dependent metadata - the choice
being up to the publisher of the metadata.

>
>> I would suggest that a new URN scheme could directly represent the notion
of
>
>Why do we need a URN scheme for this? Can't we just use RDF? Although a URI
>scheme might be nice, but I can't see where it would be useful. And how
>would we represent versions? With numbers? dates?
>
>> 1) Firstly, make the services processes (i.e. daemons).
>> The extra constraints of making the URNs in documents conform to the http
>> protocol for N2L mappings disappear (hooray!).  You also don't have to
>> specify L2N mappings within documents, avoiding unnecessary clutter.
>
>I don't see what you mean by this.
>

This was in response to Pierre-Antoine Champin's document, which suggested
using a specific urn scheme with two ways to translate from a URN to a
corresponding URL to locate a representation of the resource identified (if
any).  These were:

1) URN-to-URL (N2L)
The suggestion was (AFAIK) to replace urn: with http:, e.g.
urn://mydomain/myresource.xml -> http://mydomain/myresource.xml
I suggested a different scheme using a distributed 'resolution' service to
map, say urn://mydomain/unique-id-snmeooxnsadfoij ->
http://mydomain/myresource.xml.  It seems there is work in progress to do
just this sort of thing with URNs.

2) URL-to-URN (L2N)
The suggestion was to embed RDF statements to show which URNs identified the
resource represented at a specific network location (i.e. via a URL).  I
suggested a generic service that was the equivalent to Reverse-DNS for this
task (i.e. outside of the RDF sphere).  Though, of course it could be done
as embedded RDF.

IMHO, it would be a bonus for any such scheme to reflect versioning at a
more fundamental level for reasons given above, though I am not suggesting
it be done solely for this purpose.

>> (i.e. just do a DNS lookup on a URN /
>> URL to identify the server with the relevent N2L / L2N daemons for that
>> domain).
>
>How can you do a DNS lookup on a URN or a URL? And what type of DNS record
>would we use?
>
>> Anyone wanting N2L & L2N capabilities for mapping urns within
>> their own domain space simply run these services alongside DNS.
>
>URNs have domain spaces?
>

As you well know, the process of resolving a URL involves first resolving
the domain part to an IP address.  This is so for URLs and may also be so
for any urn scheme that includes domain information.  The URN scheme I was
suggesting would include domain information.

>-- 
>[ Aaron Swartz | me@aaronsw.com | http://www.aaronsw.com ]
>

Please note, my original post was intended as feedback on Pierre-Antoine's
document.  Not to suggest radical changes to the way the Internet currently
works.

Regards

Lee

Received on Thursday, 12 April 2001 08:23:37 UTC