Re: Clarifying what a URL identifies (Four Uses of a URL) from Roy T. Fielding on 2003-01-23 (www-tag@w3.org from January 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Wed, 22 Jan 2003 23:20:02 -0800
To: Tim Berners-Lee <timbl@w3.org>
Cc: Sandro Hawke <sandro@w3.org>, www-tag@w3.org
Message-Id: <16D814C4-2EA3-11D7-BEAC-000393753936@apache.org>
> Perhaps a little explanation is in order.

Thank You!  Sorry our posts passed each other on the net.

> Roy and I agree on how HTTP works.  (Roy, forgive
> me if I misrepresent you.)
> HTTP relates URIs to representations which are returned.
> While the spec mentions resources, the protocol itself
> does not actually constrain what they actually are.
>
> The issue only arises when, in the semantic web, we
> we extend the formal system from network objects
> and TCP streams to arbitrary concepts. Then,
> in a formal system where one has to chose one,
> we ask ourselves what exactly is the thing
> we should say is identified by some http URI - the
> picture of the car, or the car?  Either is consistent with HTTP.

Yes.

> We agree that with HTTP a number of different
> representations of the thing identified by the URI.
> I want to use the URI to identify the picture.
> Roy has always felt it identifies the car.

+[if that is what the authority intends it to identify]

If the authority intends it to represent the picture, I would
not disagree with that at all.  Most URIs do identify a virtual
document of informational state, and always do so at least
indirectly.  But most != all.

> Either system is self-consistent.

I still don't understand how that system explains a POST
of a message to an HTTP-to-SMS gateway that is identified by
an http URI.  I'd like to understand that.

The car analogy is too convenient -- it relies on a person using
one URI ambiguously rather than any aspect of the URI itself.
Replace it with a urn URI and you still have the same problem.
A web browser (assuming it implements urn) will still use that
identifier to identify both the resource and whatever is returned
by GET on that URI.

> I use "representation" to refer to the relationship between
> the picture and the bits. Roy uses it to refer to the
> relationship between the car and the bits.
> We are using the same english word for different
> technical relations.

Hmm, I thought we were both using it to mean a package of bits
that represent the current state of the resource, and I think
that remains true whether the resource is a virtual document
or something else.

> There are a number of reasons why I strongly prefer
> the URI to identify the web page, and I have gone into
> them elsewhere, for example in
> http://www.w3.org/DesignIssues/HTTP-URI.html
>
> This is the crux of the HTTP range issue 14.
> (There are other different issues related to fragment identifiers
> and content negotiation.)

I agree that you have very good reasons for preferring to think
of the resource as the web page rather than its subject.  However,
that isn't a sufficient model to talk about the state-changes that
can occur on the subject as a result of methods applied to the
resource.  As long as it is desirable to use HTTP as a window into
the universe of real objects, whether they be temperature control
systems, robots, or gatways to other information systems, it isn't
possible to limit http URIs to identifying virtual documents. Those
things do exist and are both functional and desirable, and most
importantly are identifiable via http URIs today.  It is therefore
evident that an http URI does not *always* identify a Web page,
regardless of anyone's preference.  We need to find some other
solution to the issue.

> One can't argue it by arguing about the meaning of english words.
> "representation", "document".   One can't just argue it based on
> appeal to the way humans use URIs to refer to things.
> These aren't the formal system. They resolve ambiguities
> all the time with great alacrity.
> One *can* introduce a new system with a different design
> and argue its merits. Sandro has designed an alternative
> system http://www.w3.org/2002/12/rdf-identifiers/
> which seems consistent and I haven't finished thinking
> about - there are things I like about it and things I don't.
> But it does address all the questions, I think.

The main reason that http is defined as identifying resources
is because nothing HTTP does is scheme-dependent.  HTTP treats
wais, gopher, foobar, urn, and http URIs all the same and responds
to GET on any of them with representations that look like web pages.
That is how the implementations like libwww work.  To treat http
differently just because it is http is more than a bit awkward -- it
suggests something which doesn't hold true for the implementations.
The implementations intentionally hide that nature from the client.
Thus, it makes the most sense to say that none of those necessarily
identify a web page, even though the response to GET is a web page,
and if we focus on the response to GET being a web page then it
allows HTTP to become an interface to anything having state,
which is exactly what has been achieved.

I am a little disappointed that Sandro introduces another set of terms
that are just as open to disagreement.  An information resource may
consist of multiple subjects, each of which might be misinterpreted
by an author as "the" subject. In any case, I certainly do not advocate
that all URIs identify the subject of the information within web pages,
though I would claim it is possible to construct a URI that does.
A resource is defined in English not only as Sandro describes -- a
source of future information -- but also as an available means.
Some http URIs only exist to be a sink of information.

I would like to stick with resource as defined, and representation as
one set of bits representing the state of a resource.  If we need to
invent a new term for the notion of a virtual web page independent of
bits, then I suggest "view". However, before we do that I'd like to know
what additional expressiveness do we get in the architecture by this
definition.  In particular, what assertions can be made about a view
that are not better assigned to either the resource or its
representations, keeping in mind that the authority is the only entity
that can distinguish between a resource and a view of that resource?

In my opinion, a resource remains anything that can have identity,
and a view is the mapping GET(resource) over time, since otherwise
I don't understand why it would be called a web page.  I think it
would be better to have a set of predicates (or syntax) for saying
that in RDF, rather than inventing a new term.

In any case, I think something like Sandro's proposal is necessary
even if we were to assume all http identifiers denoted web pages.
People want to be able to make assertions about a web page and
assertions about the subjects described by web pages, and quite
frequently they want to make assertions about subjects that are
only peripheral to the main subject of the web page.  People should
also be able to make assertions about fragments of a web page without
the client assuming that it is some special class of resource.

All of those should be possible in RDF, as should the ability to
make assertions about the resource (the identified sameness) and
individual or collective representations of its state over time.
Creating a defined vocabulary for that is a fine idea, but I don't
see why it can't be done within the definitions already used by
the REST model.

....Roy
Received on Thursday, 23 January 2003 02:19:32 UTC