Re: Clarifying what a URL identifies (Four Uses of a URL) from Roy T. Fielding on 2003-01-25 (www-tag@w3.org from January 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Fri, 24 Jan 2003 19:28:15 -0800
To: Tim Berners-Lee <timbl@w3.org>
Cc: Sandro Hawke <sandro@w3.org>, www-tag@w3.org
Message-Id: <0A16457A-3015-11D7-ACF9-000393753936@apache.org>
> Ok, here is one hook to a difference in the model you and I have,
> Roy.  You point out that the API in libwww basically provides
> the functionality of HTTP, and at the same time gives access
> to FTP and so on.  You use this an an illustration of a theory that
> all URIs have the same interface as HTTP, that HTTP
> extends over the web the interface of libwww in a quite generic
> way, while other protocols only support some of the features.
> Hence the ability of HTTP proxies to provide access to FTP and
> Gopher.
>
> Which is is logical. However, it does not address the range of all
> URI schemes, and of course as HTTP basically doesn't play with
> the fragid, it doesn't involve that at all.

It only needs to address those schemes for which a representation
is a useful and desirable thing, and in those cases it does so.
It doesn't play with the fragid because the fragid is not a first-class
identifier in the system -- it is impossible to do anything other
than GET or name-equivalence on a fragment.  AFAIK, that is true
within the Semantic Web as well, so I don't know where you are
going with this.

> It is a reasonable bit of software design for libwww to generalize
> where generalization can be done, and it is not surprising that
> HTTP, as a later design, "embraces and extends"  FTP.
> And HTTP is in fact a good model for the Web, and the category of
> URIs for which this model holds (http, https, ftp, gopher)
> are important, because they form a web of network information
> objects.  (I'm happy to call that the Web, and exclude "Web" Services,
> by the way. We can call them "Internet Services" if you like.
> I think this so far if what you call the REST model.).

Information-providing resources, yes, but anything with state has
information it can provide.

> But other URIs don't fall into that scheme.  mailto: URIs
> identify mailboxes, and to say that you can make an HTTP proxy
> represent a mailbox is a kludge.

That is your interpretation of what mailto identifies, which 
unfortunately
isn't supported by the specification.  In any case, that would not be a
kludge.  HTTP is an interface protocol, not a web page.  What 
interactions
can you have with a mailbox that you cannot have with an HTTP resource?
None.  A mailbox is a subset, and therefore a trivial interface to build
via HTTP.  That doesn't mean its a good idea to do so: the HTTP write
and append mechanisms are necessarily more abstract, less efficient, and
more genericly defined than those in SMTP and IMAPv4, but it is 
possible!

> A web site can have various
> pages which give various sorts of information related to a
> mailbox, but conceptually a mailbox is a delivery point
> not an information object.

Conceptually, a mailbox is both a delivery point and an identifier
that is used in various ways by information systems to allow storage
and organization of received and sent mail.  How is that different
from a collaborative weblog?  The only real difference, aside from
protocol syntax, is how the default access control is defined on
mailboxes.  It is still a mailbox and mail is still delivered by
way of an SMTP interface, neither of which prevents the HTTP
interface from correctly interpreting and processing requests from
those applications that, for whatever reason, wanted an HTTP proxy
to their mailbox.

> You could map HTTP's POST to it but not HTTP's GET.

I would only map those methods for which a use exists.  The point is not
to make every system HTTP (that would be absurd), but rather to make the
information within every system accessible via HTTP.  Web browsers took
the name literally and, for security reasons, interpret a GET on a
mailto URI as a request to open a mail application in composition mode
with that mailbox pre-filled in the To: field (and others if you
implement according to the RFC).  Whether or not you agree that those
semantics were originally intended, they are a reasonable interpretation
of the identifier's mapping to a representation.  It isn't the mailbox
itself, of course, but that isn't the implementers' fault.  They
think it literally identifies "initiate mail to".

> Similarly, telnet: URIs are end points for interactive sessions.
> You can connect to one by a java obect in a web page, but
> that doesn't mean they are like web pages any more than
> a flower pressed in a book is a piece of paper.

telnet URIs are initiations of telnet sessions, not complete telnet
sessions.  I didn't say it was like a web page.  HTTP interaction via
proxies is obviously not limited to web pages, so why would I have the
proxy represent it as a web page?  Even if were limited to web pages,
an applet fits within your definition of a web page and is a perfectly
valid response to a GET on a telnet URI because GET is requesting that
the session be initiated.  Likewise, using a mailcap handler to find a
helper program and invoke it with instructions to initiate a telnet
session is performing a GET on a telnet URI using the generic interface,
and it was implemented on all of the Unix-based Web browsers in 1993.

> So that is I think one way in which our formalizations of URIs
> differ.

I cannot emphasize this enough:  if the conceptual work formalism
cannot accommodate established best practice on the Web, then it is not
a valid formalism of the Web.  It may be a formalism of something else,
perhaps something even better than the Web, but it is not formalizing
the Web as we have implemented it.

[...]
> RDF people do not in my experience use a URI to represent
> both the resource and a representation. Well, I don't.
> (Cwm has, for example,  a relationship -- a built-in function -- 
> log:semantics
> which relates a resource to what you get from retrieving a 
> representation and
> parsing it, and another, log:contents which relates a resource to
> the bits of any representation of it)
>
> If you assumed that is what people are doing , it may be because you
> are mapping their words onto your concepts, not theirs. You maybe
> forget that for me, for example, the car and the picture of the car are
> distinct. It is the confusion between those which causes a problem.

This is the point where I have repeatedly said to you during our 
meetings
that neither I nor the REST model ever confuse those two things.  The
resource is what is identified.  The representations are what the client
receives.  They are always distinct.

> Now, you don't write RDF so I am not sure how I discuss this with you.
> I've written a lot of http://www.w3.org/DesignIssues/HTTP-URI 
> specifically
> about this and I don't know where to start.

I can read RDF.  I don't write it because I am more likely to get the
semantics wrong due to syntax error than due to English errors.

> I think you must agree that once my program accesses the web page
> which we will say is a picture of a car, then it has a representation 
> of
> a picture on bits. It has therefore a concept of the picture.
> The picture itself has important properties such as who owns it
> and made it, and what its copyright information is.
> You say that that is information about the representation, but I would
> point out that a picture can have many representations, in JPG PNG
> and GIF at various levels of resolution. They share owner,
> copyright, date of creation, creator, focal length, genre, exposure,
> orientation, and so on, because they are all what I would call 
> representations
> of the same picture, the same conceptual work.

They each independently hold the same relationships with a target.
No problem.  Hopefully they will tell you so using metadata.

> This commonality is very strong, and points to the value of
> being able to identify the thing they have in common: the picture.
> And normally, when I want to make  a hypertext link to that
> it is to the picture, not to a representation, that I want to make
> the link.  So the argument that we are "just talking about
> representations" doesn't fit the bill. It doesn't meet the
> requirements to be able to talk about the picture as a conceptual
> work.
>
> Now, you say the owner of the HTTP URL can declare that it actually
> identifies the car. I say that messes things up.   Suppose the owner 
> does
> that -- suppose they mark up the JPEG with a comment field indicating
> that.  Now my client program has no ID for the picture.

You are right.  By doing so, the authority has specifically said that
it is not guaranteeing future representations will be a picture.  
Perhaps
it will replace it with an MPEG movie, or a text description, or maybe
it will always be a picture and the authority simply doesn't want you
to have the picture's ID.  If they DO want you to have that ID, then
they would have supplied a link to an ID that did identify a picture.

In fact, the only thing the authority has done is said that you can
identify the car with that URI.  Basically, they are saying that there
is a permanent, N:1 relationship between the URI and the car
that is a valid ID both inside and outside the Web discourse.

The reality is that you never did know that the URI was identifying a
picture, because the identity mapping for an http URI remains hidden
inside the server until they supply some other information out-of-band
that might explain the semantics to the person making the link.

That is the crux of the issue -- who gets to decide why a link gets
broken?  Is it the authority or the people who made links based on a
mistaken belief that the authority shares their conception of the URI's
meaning?  "Cool URIs don't change" is an explicit recognition that
the naming authority controls that meaning.

> Now here's the rub. When the URI was for the picture, then I
> can indirectly identify the car with it, as "x, where <car.jpg> is a 
> picture of x".
> In N3 that looks like  "That which has picture car.jpg".
>
>  [ has  :picture  <car.jpg> ].

No you couldn't.  You could only say "That which is represented in
this picture from <car.jpg>."  You only assumed it identified a picture.

> That's cool.  Its what we do all the time to identify things for 
> example
> people by SSN. "The car whose picture hangs above your mother's 
> fireplace"
> and stuff.  KR sytems thrive on it.  What doesn't work is if we
> say that <car.jpg> actually is an identifier for the car.
> Because "the picture of the car" doesn't identify the picture  - it 
> identifies
> any picture of the car.
>
> [ is :picture of <car.jpg> ]
>
> You can write it but it doesn't work.  Its not a bug in RDF. It is a 
> fundamental problem
> with the URI system we assume that you don't have an identifier for 
> the conceptual work.

It is completely invalid to assume that any representation on the Web
will maintain the same format over time just because the recipient has
once observed it and made an assertion about it.  Your assertions are
therefore entirely dependent on the authority's wilingness (or ability)
to maintain that mapping as a picture over time.  The only thing the
client can validly assert is that, during some range of time T,
GET(me, <car.jpg>, T) consistently results in a representation with
form=picture and subject=car.  You don't have the ability to assume
the identity of <car.jpg> is either a picture or the physical car,
for the same reason you don't have the ability to assume it is in
JPEG format.

You can hope that it is a picture (or hope that the representation
will always be a picture), and you can make assertions based on the
probability that your hope will hold true for at least as long as
that assertion is used, but you cannot make guarantees on behalf
of the naming authority.  That is indirect identification of the
picture, not direct identification, and is one of the more common
ways that links are known to become semantically broken on the Web.
I wrote about that way back in my MOMspider paper.

That's an interesting question in itself: how does the conceptual
work model describe the cause of link rot due to changes in format
or content that were not anticipated by the link author?

It would be better to know all of the URIs that are associated with
the generation of a representation so that the client could then choose
which semantic they wish to capture for future reference.  That's what
I was hoping RDF could do as metadata within (or linked from) the
representation.

> An example you give often is a robot.  To an RDF system, a robot
> which can be driven by the control panel at <robot.html>
> can be formally referred to in just the same way
> as  [ :controlPanel <robot.html> ].    (That which has control panel 
> <robot.html>)
> This works.

It needs a time qualifier, but that's okay.  How would you formally
describe that POST on a given URI turns the robot to the left?

Actually, I would identify the robot by <robot>, and its
representation would include some type of control-form that would
target another URI identifying the robot's control panel, which
could be as complex or simple as desired by the interface
(e.g., it could vary from relative thrust/direction controls to
a simple query of what the next target should be).  Either way,
we can model the flow of information directly from client to
robot without any mention of how the interface is implemented.
And the Semantic Web can still use <robot> to unambiguously
identify the robot, because the URI is an N:1 relationship with
the robot, not with the robot's current representation.

> Let me summarize
>
> - Web software needs to be able to express things about conceptual 
> works
>  They are a big part of the web system and of our society.
> - When you identify a conceptual work, you can retreive representations
>   and you can indirectly identify abstract things.
> - If you say that the URI identifies an abstract thing you cannot 
> refer tothe
>   conceptual work.

The last is not true.  If the URI identifies an abstract thing, then you
cannot use the same URI to directly identify the conceptual work.  You
can, however, discover the relationship between the two and obtain a
different URI for that conceptual work, if the provider of the 
conceptual
work is willing to provide that separate URI.

This is no different than content negotiation, and the Web does not 
force
an information provider to supply the individual URIs for variants of
a negotiated resource.

The benefit of all this, BTW, is that we don't need to special-case
the fragment identifier in RDF.  All of that complexity is completely
unnecessary if we don't assume anything about the target of an http
identifier aside from it being accessible via an HTTP interface, which
makes perfect sense given that we don't allow the client to assume
anything else about an http identifier, regardless of which model is
used.  Likewise, http as a scheme for xmlns URIs becomes a dead issue.

> I think that you will find that the REST model is not harmed in any way
> by introducing an extra concept of the conceptual work betwen
> "representations" in  and what you used to call the resource.
> I think you will find it has a nice consistency and solidity.

I'm afraid not.  You just claimed that telnet and mailto are not
implementable as a conceptual work.  I know they are under REST and
on the Web itself.  How can I not consider it harm to give that up?
Furthermore, I can add any URI scheme to the Web and the REST model
does not have to be changed to accommodate it, whereas placing
scheme-specific semantics on the Web interface means that a new
scheme cannot be used until all software is changed to embed the
scheme-specific semantics into the client.  In my case, all I
have to do is provide HTTP proxies, at least until such time as
the scheme becomes manifestly useful to implement directly by
each client.  The direct implementation may provide a richer
interface, but an information-window interface like HTTP
is sufficient to enable deployment prior to popularity.

I also believe that the fragid indirection actively harms a
namespace that is large or hierarchical in nature, and Mark has
already given examples of how that harm manifests itself, so
its not as if the conceptual work model doesn't have its own
drawbacks.

....Roy
Received on Friday, 24 January 2003 22:27:49 UTC