yet another sidetrack on what a URI identifies from Roy T. Fielding on 2003-01-15 (www-tag@w3.org from January 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Tue, 14 Jan 2003 16:17:17 -0800
To: Sandro Hawke <sandro@w3.org>
Cc: www-tag@w3.org
Message-Id: <B454CFF8-281E-11D7-92BC-000393753936@apache.org>
> I totally agree.  I think we got into this mess because people started
> to think there was a single mapping function from each URI-syntax
> string to the-one-thing-it-identifies.  But people can and do use one
> http URI to identify many different things, most crucially including a
> web site AND the main thing described on that web site (what I think
> you would call the thing whose representation is retreived via the
> URI).

I just call it a resource and a representation.  There is no need to
come up with any other, longer terms.  URIs identify resources.  GET
(or its semantic equivalent in non-HTTP protocols) can be used to
obtain representations.  A web page is a representation at time t,
often composed of other representations.  There is no magic involved.

> This dual use is fine (and has wonderful benefits) as long as we don't
> get confused and think there is somehow one URI->thing mapping
> function.  If there were only one such function (and it was a
> "function" in the mathematical sense of being one-to-one or
> many-to-one), that would mean the web page was identical to the
> encryption algorithm.  And that's just silly.

No, it would not.  Any identifier can be used to identify anything
via indirect identity.  A URI that directly identifies a pathname
within an http authority's namespace can also indirectly identify
any concept whatsoever provided that it does so consistently.
The only time "http" identifiers are used to identify the listener
itself is when they are used to configure proxies -- otherise, all
such uses are indirect, and people are fooling themselves if they
think a URI identifies a web page just because that happens to be
what is returned by GET.  A wais URL does not identify a web page,
and yet that is exactly what gets returned if GET is applied to one
via an internal or external gateway (libwww or proxy).  What matters
is that the author of the link they are following expects that
future traversals of that link (or uses of that name) will have
the same set of semantics that caused them to create the link
in the first place, and hence the URI identifies a resource
which happens to result in a web page upon GET.

The identifier indirectly identifies a concept and, in response to
GET, an HTTP server maps that identifier to a representation that
has been assigned (somehow) by the authority.  A normal user doesn't
care about this because they never see the method -- all they see is
the link -- but their limits of perception do not define how the Web
works.  The method is still required for Web software to make
interoperable decisions and complete the request, and therefore it
is part of the system whether or not the average user is aware of it.

> I think you address this by ignoring the web page in the middle, and
> saying the URI identifies the encryption algorithm, and what you see
> when you put that address in your browser is a representation of the
> algorithm (or maybe a rendering of that representation).

That is hardly "ignoring" the issue.  It is a fundamental aspect of
how HTTP works.  It is what makes caching possible, and defines why
the "freshness" of a web page matters.  It is also the only model of
URI that takes into account all of the other HTTP methods.

> But that's just not how people think.  People think there is a web
> page at http://www.w3.org/.

No, they think that when they traverse that link they get a Web page
of the W3C.  They don't care how W3C implements its server.

>   Every published use of a URI I've seen
> (away from W3C) in the past few weeks (since I started watching
> carefully) frames the URI like "visit us at <web address>" or "my 
> website
> is <web address>" or "I read it at <web address>" or "<web address>
> has some great stuff".  All of those forms demonstrate that people use
> URIs to denote web pages, web locations, web sites, etc (*information*
> resources), not the abstract entities (the weather, some car, some
> dog, the moon) which we might learn about from those sites.

They use such forms to denote what they are saying:

    "visit us" == traverse.
    "my website" == this is where I am the authority.
    "I read it at" == what I got from URI at time t'.
    "URI has some great stuff" == future GETs of this resource will
                                  have useful representations.

All of those statements are consistent with the REST model, but
simply use more common (less precise) terminology.  All resources
are information resources when and ONLY when the appropriate method
is applied to them -- that is where the W3C got lost.

> So the W3C tried to find a middle ground -- recognizing that web pages
> are real, and still identifying things like algorithms with the same
> URI -- by using fragment syntax.  This was a big mistake.  The right
> thing is to simply distinguish between the-web-page with the URI
> "http://www.w3.org/2000/09/xmldsig/RSA" and the-algorithm with the URI
> "http://www.w3.org/2000/09/xmldsig/RSA", using whatever mechanisms are
> appropriate to the language at hand.   (I've suggested how to do it in
> RDF.  In most other situations I've seen, it's fine to let the user
> figure out which way it was meant.)

You have to do that even if the resource is static by nature,
because no access at time t is equivalent to the resource itself.
A resource is, by definition, a source for FUTURE accesses, not a
reflection of some past access.  It is therefore impossible for a
resource to be a web page, because the fact that it is accessed at
that other site is part of what makes it a resource.  A web page is
the result of an access as delivered to your browser.

That does not prevent people from identifying the access at time t
in terms of the URI -- it just requires a conscious acknowledgment
of the time and method used in order to formally complete the indirect
identification.  Therefore, declarative semantics must differentiate
between "there exists a time t at which GET(URI, t) has the following
properties" (a statement about one representation) and "for all time t,
GET(URI, t) has the following properties" (a statement about the
resource that is reflected in all of its representations), not to
mention "for all time t, LOCK(URI, t) has the following properties".
The URI alone is just a name, not a "web page", "information resource",
or anything else that is simply derived from actions using that name.

Understanding that distinction is absolutely necessary to reasoning
about the Web.  It isn't a matter of opinion -- it is a direct result
of how the technology uses methods and URIs with late binding, and
required a great deal of thought for HTTP/1.1 caching.


Cheers,

Roy T. Fielding, Chief Scientist, Day Software
                  2 Corporate Plaza, Suite 150
                  Newport Beach, CA 92660-7929   fax:+1.949.644.5064
                  (roy.fielding@day.com) <http://www.day.com/>

                  Co-founder, The Apache Software Foundation
                  (fielding@apache.org)  <http://www.apache.org/>
Received on Tuesday, 14 January 2003 19:18:17 UTC