Re: Clarifying what a URL identifies (Four Uses of a URL) from Roy T. Fielding on 2003-01-22 (www-tag@w3.org from January 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Wed, 22 Jan 2003 01:32:28 -0800
To: Sandro Hawke <sandro@w3.org>
Cc: www-tag@w3.org
Message-Id: <6C739EDA-2DEC-11D7-9535-000393753936@apache.org>
Why is it that the newcomers at w3.org have an opinion on this subject
that differs from W3C's own libwww implementation?  I think you guys
should spend a little more time studying how the Web works instead of
inventing new ways to disparage other people's designs.

>> I suggest you read RFC2396 and the Webarch draft.  When I say a
>> formalism I mean formalism.   A resource is per RFC2396 "anything that
>> has identity" and a URI is that which identifies a resource.
>
> As Mike Mealling puts it -- a "platonic ideal".  There is exactly one
> resource per URI by definition.  (Or, roughly, until you start getting
> 301 responses.)  We can't know what that resource "is"; it's just an
> unknowable mental construct RFC2396 defines as existing.

301 is a redirect to a different resource.

>> A resource, thus defined, has access mechanisms whereby you can 
>> retrieve
>> and update representations.  This formalism is complete, consistent, 
>> and
>> highly robust in practice, underlying the construction of the most
>> succesful information system in history.
>
> In fairness, I think this only applies to HTTP 1.1, not the entire web.

No.  Go look at the code and see how it handles all URIs.  HTTP is
an extension of that interface across the Internet.

>> I admire your chutzpah in charging here and making claims about the
>> undefinedness of the term "Resource" but that doesn't mean you're
>> anything but hopelessly wrong.
>
> You could take David's message as a sign that a whole raft of
> professional software developers think this notion of Resource while
> workable is somewhere between poorly explained and imperfect.

I don't.  I take it as a sign that someone at W3C suggested the team
should look into this problem and make noise.  I suggest that you look
into the problem a little deeper and without assuming that the
Semantic Web is somehow broken because of it.

> Working *perfectly* for HTTP is not evidence that it works anywhere
> else.   (other people have cited the parable of the blind men and the
> elephant.)   And the success of the Web is of course due to many, many
> factors.

I have seen no evidence that it doesn't work, anywhere.  Some SW folks
*claim* that if you allow the RDF producer to make ambiguous statements
about both representations and the resource using only the URI as the
target subject, then it results in ambiguity.  Well, of course that
would cause ambiguity, which is why they are NEVER THE SAME THING on
the Web itself.  The answer is: DON'T DO THAT.

> Once you step outside the formalism, not only do you want to know what
> kind of thing a specific Resource is, but you notice that everone is
> using each URI to identify several distinct things.   So the
> fundamental premise of 2396 breaks as soon as you step outside the
> formalism.

Nothing in 2396 breaks because of that. 2396 defines the syntax for
identification.  It doesn't define how URIs are used.  It doesn't even
define how they are used on the Web.  What it does define is that they
are identifiers and they identify resources and they do so using a
uniform syntax.  Resources in RFC 2396 are not even limited to 
information
objects, since they are specifically intended to include the naming of
physical things and do so quite well.  The scope of the REST model,
for example, is more restricted than the scope of 2396.

Regardless, how people use URIs (how a URI can be used to identify
something indirectly, including those things other than the resource)
is an entirely separate issue from the identity of a resource.  If the
Semantic Web is only interested in identity, then it doesn't matter
how many other ways that the URI is being used.  Likewise, regardless
of how many new terms are invented to redefine the holy grail, there
is no way to stop people on the Web from using a URI (any URI,
regardless of scheme) in ways that the originator did not intend,
and thus indirectly identifying things other than the originally
intended resource.

The problem occurs when we face up to the fact that the Semantic Web
is not just a generic KMS, and in fact is very interested in the Web
and what people identify when they create anchors.  Once there, we must
accept the fact that the Web defines URIs and methods as two separate
protocol elements, and therefore it would be incorrect to define
resources other than how they are defined in 2396 and used on the
existing Web interfaces described by HTTP and implemented in dozens
of independent open source projects that you are free to inspect.
URIs alone are not sufficient to target assertions about content on
the Web, even if we restrict our discussion to resources that act
like information repositories.

It therefore behooves the Semantic Web to adapt to the Web as it
exists and works in practice, not try to force new definitions on
it that are misdirected and impotent.

>> In the Web Architecture formalism, http://x.org/love identifies only 
>> one
>> resource.  In the real world, I can learn about that resource by
>> retrieving representations of it (if any are available), and more by
>> processing RDF assertions about it (if any are available).  The Web
>> architecture doesn't talk about meanings, it talks about resources and
>> representations.  There's nothing wrong with talking about meaning, 
>> and
>> I look forward to the day when I can reliably retrieve some RDF
>> assertions and learn that this particular URI identifies nothing but a
>> JPG of a cute cat, and this other one identifies the inner thought of 
>> a
>> drug-addled conceptual artist.  This would be good and useful.
>
> And if you http GET a representation of the artist, what will the
> Last-Modified field mean?  It doesn't mean when the representation was
> last modified, or when the resource (the artist) was last modified.

Read the definition of Entity Header Fields in RFC 2616.  It means
the representation.  It would say that explicitly if it were not for
a pointless difference of opinion on terminology amongst 2616 editors
that is completely unrelated to this topic.

> So maybe the a-Resource-is-anything-with-Identity idea doesn't even
> really hold for HTTP 1.1.   [ Sandro continues his argument that the
> word "Resource" masks an underlying disagreement and confusion even in
> the design of HTTP 1.1.   It's not so bad that an expert human
> implementor can't sort it out and know when it refers to the object in
> the domain of discourse and when it refers to the computer's
> information about that object, ... but it is bad. ]

Sandro, you aren't qualified to have that argument.  AFAIK, none of
the editors have ever quibbled over the definition of a resource.
TimBL and I only disagree about the scope of http identifiers and
whether or not RDF needs to differentiate between resources and
representations, and even then I think we'd agree if it were not
for this never-ending stream of disinformation coming from people
who have never worked on a libwww.

....Roy
Received on Wednesday, 22 January 2003 04:32:02 UTC