Re: URIs / URLs from Aaron Swartz on 2001-04-12 (www-rdf-interest@w3.org from April 2001)

From: Aaron Swartz <aswartz@swartzfam.com>
Date: Thu, 12 Apr 2001 16:33:12 -0500
To: Lee Jonas <lee.jonas@cakehouse.co.uk>, RDF Interest <www-rdf-interest@w3.org>, RDF Logic <www-rdf-logic@w3.org>
Message-ID: <B6FB8727.910E%aswartz@swartzfam.com>
Lee Jonas <lee.jonas@cakehouse.co.uk> wrote:

> My understanding (which could, of course, be wrong ;-) is that:
> 
> 1) the resource could be regarded as Pierre-Antoine Champin, or his
> conceptual Home Page, etc.  We have no way of knowing with URLs and
> shouldn't need to care.

Sure it could be, but understand that it need not be either of these two
things. 

> 2) a representation is the actual html doc (or whatever) accessible through
> the http protocol with the (http specific) network location
> '//example.edu/~champin/'.

Yep, that would be a representation of the resource, also called a
rendering, or a (network) entity.

> 3) the physical person Pierre-Antoine Champin, or his conceptual home page,
> etc. is identified via the location of the representation,
> 'http://example.edu/~champin'.

Yes.

But before we move on, let me clarify a something:

    A resource can correspond to zero, one or many physical entities.
    "Everything I've ever said about doughnuts" is a resource, as is "the
    computer I currently use", "Tim Berners-Lee's house" or "today's weather
    report". 

> (As an aside, identifying a resource that is not intended to be
> electronically accessed, such as Pierre-Antoine, without having to contrive
> redundant representations is where URNs are useful).

I don't see how URNs are any more useful, nor do I see how representations
are redundant. When I see the URI:
"http://elephants.org/monkeyBrothers/qXp7" I have no idea what it
identified. Luckily, I can often type it into my web browser and find out
what it means. Getting the same result out of a URN is significantly more
difficult. (Unless there's some easy way to do it that I'm missing.)

> If Pierre-Antoine leaves example.edu and is replaced by Jean-Claude Champin,
> who puts his home page in the same place at 'http://example.edu/~champin'
> then not only has the representation (html doc) changed, but the same
> location ('http://example.edu/~champin') now identifies Jean-Claude (or
> Jean-Claude's conceptual home page, etc).

Then this is a grave mistake on the part of Example University. They should
not have done this. If they have defined that address to represent the
resource of Pierre-Antoine, then they should not change it. However, as I
originally suggested Example University's resource policy seems to call for
the fact that the URI represents the resource of "the Champin currently
attending Example University". Thus, the resource has not changed, but the
representation has. However, if the University does not make the true nature
of the resource clear, I agree, this is misleading and should not be done.
However, the fact is that the resource has not changed.

> Taking 'http://example.edu/~champin/' to identify "the person named Champin
> currently enrolled at Example University" as you said above implies that
> URLs do not identify these resources through representations, but identify
> mappings-to-resources through representations instead.  This would have very
> serious ramifications for RDF.

I don't follow the difference between identifying resources through
representations and identifying mappings-to-resources. Could you clarify the
difference? Fundamentally, the fact is that the resource represented by a
URI should not change, and in the majority of cases, it does not.
Unfortunately many times the resource represented is not made clear.

(I'd also like to make clear that these issues are vague, and not clarified
by any standard, to my knowledge. These are merely my views and
interpretations.)

> Regardless of the interpretation of Resource, doesn't it strike you as a
> major blow to writing metadata statements about URLs that remain correct
> over time?  Consider making assertions about Pierre-Antoine only to find 3
> months later that those assertions are are actually making (probably false)
> statements about Jean-Claude instead.

Exactly! This demonstrates the point that not only should resources never
change, but the resource itself must be made clear. For example, it is
widely assumed that the following URI identifies the resource of the top
"hits" on Google for the text "foo". However, to my knowledge, Google has
not said this for sure.

http://www.google.com/search?q=foo

The line between changing resources and not properly defining them is
unclear and may largely be considered the same if the undefined resource is
incompatible with what it's largely expected to be.

>>> This is a problem in general in terms of the transient nature of the
>>> Internet.
>> But yes, changing the meaning of URIs is bad. However, not specifying the
>> exact meaning of URIs is also bad (as the above example shows).
> Assuming you intend 'the exact meaning' to be specified via RDF, then no URL
> can be dealt with in a correct manner programatically unless you can
> interpret the RDF first.

Well, no. The meaning of a URI probably won't be specified by RDF for a
while. As our examples with Example University and Google show, the majority
of pages don't have their underlying resources defined, by RDF or any other
means.

RDF is not the only way to ascertain the resource of a URI, but I think that
you are correct, URIs can not be programmatically treated (at least for most
things) without some knowledge about what they mean. That knowledge can be
found using RDF, using natural language, or in many other ways. But I don't
think computers can just guess or simply dereference the URI.

> Insisting that use
> of URLs relies on RDF could lead to cyclic dependencies.

I don't insist that, for reasons you point out and others.
 
> I believe the "exact meaning" of URLs should be constant, simple and well
> known.  Judging from discussions on RDF-IG so far, 1 out of 3 isn't so great
> ;-)

I think even one of those would be an achievement. ;-)

> * case 2 is due to the natural evolution of the same resource.

I think this is what's happening.

> I went on in
> further mails to wish that this specific case was catered for more formally
> as part of the URI mechanism, allowing representations of the different
> versions to exist in the same network location with access to specific
> versions left to the protocol.  This would allow RDF to describe either
> version-independent metadata or version-dependent metadata - the choice
> being up to the publisher of the metadata.

I don't think this needs to be part of the URI mechanism. Versioning is a
concept that only makes sense for certain sorts of resources (mostly
documents). Do you come in multiple versions? Does the weather report? Since
the concept is so tricky, and not very general, I think it's best to leave
things as it is: different resources for different versions. Modifying the
URI spec would merely make things more complicated and difficult.
Furthermore, even when the publisher does not define a resource for a
version of their document, you can do so yourself in RDF, or another
language.

> Please note, my original post was intended as feedback on Pierre-Antoine's
> document.  Not to suggest radical changes to the way the Internet currently
> works.

Of course. At this point, I don't think we need radical changes to the way
the Internet works...maybe for RDF 2.0. ;-)

-- 
[ Aaron Swartz | me@aaronsw.com | http://www.aaronsw.com ]
Received on Thursday, 12 April 2001 17:34:04 UTC