Re: How do you POST to a "document"? from Roy T. Fielding on 2003-01-23 (www-tag@w3.org from January 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Thu, 23 Jan 2003 15:52:06 -0800
To: Sandro Hawke <sandro@w3.org>
Cc: Tim Berners-Lee <timbl@w3.org>, www-tag@w3.org
Message-Id: <ADCCD6EF-2F2D-11D7-BFE5-000393753936@apache.org>
>> I still don't understand how that system explains a POST
>> of a message to an HTTP-to-SMS gateway that is identified by
>> an http URI.  I'd like to understand that.

[...]

> So I think of the web as mediated shared memory.  Each web address
> (URI) points to a storage location.  GET means to read the contents of
> a location, PUT means to store replacement contents in a location.
> Sometimes I think of the locations as individual whiteboards, bulletin
> boards, shelves, slots, or parts of a landscape where a signboard
> could be placed.

That doesn't solve the issue that TimBL mentioned, because it simply
replaces resource (a concept independent of any implementation)
with a conceptual definition of one particular implementation.  We
still have the issue that the identifier is being used to identify
both the shared memory and what lies behind that shared memory.
Worse, we've broken the consistency of the REST model for non-http
URIs when introduced into the same interface -- it isn't reasonable
to claim that those URIs identify shared memory, but it is quite
reasonable for us to produce representations of them on demand.

[...]

> But now I'm handwaving.  Are you nodding in understanding or scowling?

I am trying to understand why it is necessary for the architecture
to recognize a complex implementation model behind the interface for
"http" URIs when it clearly does not do so for non-"http" URIs.
I could understand it if you simply declared that an http URI
identifies an HTTP interface itself, rather than a conceptual model
of the implementation behind the interface, since that would solve
part of the problem that TimBL is talking about (direct identification
being ambiguous).  That is why I said that what Tim and I actually
disagree on is not the same as what people (not just you) have been
saying we disagree on. The other part, however, is the RDF issue that
you have been working on, which I'll try to cover as I go along.

People don't use URIs to refer to the interface, but rather to the
consistent sameness found through interacting with that interface.
That consistent sameness could be thought of as a virtual web page,
but thinking so is no different than thinking of it as a concept
with a degree of sameness that matches a web page.  It means
that we either agree that we know nothing about the resource
aside from it identifying one sameness of concept (hence the
definition), or we agree that we will allow indirect identification
within the system and that indirection requires additional context
to disambiguate between differing indirect targets.  Note that
this does not change either TimBL's or my position that each http
URI only directly identifies one thing.

Please understand that the old-Web's perspective of a resource is
always through the generic interface.  It doesn't matter what the
scheme is or what the URI identifies, the Web interacts with it
through a generic interface that already makes it look like a
shared memory system (via message passing).  RFC 2396's definition,
however, encompasses all of the uses of URIs, including for such
things as inventory control of real objects.  The identity type
of a resource has no impact on how it can be viewed via the Web,
even if it does have impact on those other systems.

People are familiar with this idiom -- they enter an
identifier into an information system and the system responds
with information about the thing identified by that identifier.
I don't know of any system where a car sales organization types
a VIN into a computer and expects the car to pop out the speakers.
They are just names; expectations will depend on what actions are
being applied to those names, not based on how the name is
directly bound.  HTTP places no requirements on the binding of
http names to resources other than the syntax for interpretation
and access to the authority.  It does not even require that a
representation be available for a bound name.  VIN places one
requirement: that it be indelibly stamped on several places
within a single manufactured car and not be reused throughout
the expected lifetime of that car.

If I were to tattoo an http URI on Mark's forehead and forbid
anyone else from tattooing the same URI, then I can reasonably
claim that it directly identifies him every bit as much as the
VIN directly identifies the car.  The validity of that binding
is a social problem, not a technical one distinguished by the
naming syntax.  Likewise, if I perform a GET request on that URI,
I must accept the fact that what I get back is not Mark -- it
isn't even necessarily a picture of Mark.  What I get back is
only a representation as he defines it, and whether or not the
result is a useful resource will depend on his ability to
maintain the accuracy of its apparent state over time.  Of course,
I would never do that -- I'd just tag him with a URI that has
no representations and claim he isn't a useful resource *when
on the Web*, even though he usually is in real life.

Anyway, I hate extreme analogies like that because they really
don't illuminate anything that we would implement.

Now, let's consider the needs of the Semantic Web.  Like other
systems that use URIs outside of the Web, the SW does not interact
with resources through the generic interface.  That's fine.
However, when the SW makes assertions about *behavior* on the Web,
then it must take into account the fact that clients of the Web
interact with resources through that generic interface.  The SW
cannot make assertions about the potential result of an interaction
on the Web without specifying time, method, URI, and perhaps a
few other things depending on the nature of the assertion.
That is because Web behavior is defined by those elements as
much as it is by the URI.

Getting back to the problem that TimBL described, he would
like to define the URI as identifying the virtual Web page -- the
sameness that is perceived from all responses to GET over time.
What I can't seem to get across is that the resource in REST is
the sameness that is perceived from all responses to all methods
over time.  They are the same model, though I have so far failed
to convince everyone that the web page is just how the interface
is presented on the Web rather than the object of interaction.
REST only knows about information resources (see my dissertation)
because that is all the Web knows about.  2396, on the other hand,
and resources in general, are not limited to information resources.

As you say, changes on a shared memory can cause changes on the
associated backing "reality".  When you make those changes, are
you thinking to yourself that you really want to change that
shared memory, or that you really want to change the state of
that object to which it is only acting as an interface?
I am firmly convinced that users of a web interface to a
microwave oven are not thinking about its shared memory when
they select "five minutes", "high power", and then "start".

Does the URI identify the control or just an interface to
the controller?  I just don't care -- the sameness of
perceived interactions are identical and therefore the same
resource, whether you imagine it to be the interface or the
control itself.  The only way to distinguish the two is to exit
the system of discourse entirely, at which point we can no
longer use the same identifiers as used within the system
without additionally defining context.

I'll try to illustrate with prose rather than attempt an RDF
description (and risk incorrect syntax):

Let's say I make a set of assertions like "<http://www.w3.org/> is
presented in a way that is clear and easy to read through use of a
three-column format".

I hope that all of us agree that, for this context, the URI is being
used to (in)directly identify the Web page.  But is the target of that
assertion the resource?

A Web page, as observed by the user, is actually a coordinated set
of responses to GET requests on multiple resources that eventually
results in an application steady-state known as the completely
rendered page.  In this case, Navigator makes the following requests
of separate resources in order to form the Web page:

    http://www.w3.org/
    http://www.w3.org/StyleSheets/home.css
    http://www.w3.org/Icons/w3c_main
    http://www.w3.org/Icons/right
    http://www.w3.org/Icons/Logo_25wht.gif
    http://www.w3.org/Icons/valid-xhtml10
    http://www.w3.org/Icons/valid-css
    http://www.w3.org/WAI/wcag1AA

and the result is something that I agree is a clear and easy to read
source of information in three-column format.  If, however, I switch
to the "links" browser, then I get a Web page consisting of one
representation derived from

    http://www.w3.org/

and I am happy to say that it also is clear and easy to read. However,
it is not in three-column format.  That is because the Web page is not
just a product of the first resource, but a product of the capabilities
and behavior of the browser in interpreting a sequence of related
actions.  "Render this" is not equivalent to "this".

Does that mean the identifier is ambiguous?  No, it means that
the URI alone is insufficient to target the assertion.

I can just hear people thinking: "Well, that's a silly example,
everyone knows that presentation should be separated from content."

Okay, let's claim for a second that the URI actually identifies
the virtual notion of it being a Web page, which holds true regardless
of the subsidiary presentation resources.  Fine, but then consider
that the reason it is called content negotiation in HTTP, rather
than simply format negotiation, is because the server can deliver
different content based on aspects of the request *other* than
the method and URI.  So it isn't a virtual web page that is being
identified, but rather a set of potential web pages, one of which
the server will select for a given request.  To what degree then
can these individual virtual web pages differ before they are no
longer considered to be "the same resource"?  The answer is:
to whatever degree that the authority considers sufficient to
maintain the sameness of representation that characterizes it
as being a resource.  In other words, the URI identifies a
conceptual mapping to a set of entities, and because it is a
Uniform Resource Identifier, it follows that this must be our
definition of resource on the Web.

The fact that it is desirable for that sameness of mapping to
be as broad and consistent as possible for most resources does
not imply that it must be so for all resources.  A resource
will be as consistent as it needs to be in order to be a
useful as a future source of information for its intended
consumers.  Any assumptions about the nature or form of what
is being identified are hidden by the interface.

Miles apparently wants me to remove the definition of resource
because it arbitrarily constrains other models.  I disagree.
There are no models that I know of for which the definition of
resource is not a superset of what they wish to identify.
Other systems can restrict the domain of resources used within
that system however they like, but as soon as they make reference
to a resource in another system, such as the Semantic Web
making reference to Web resources, then they have no choice
but to recognize the meaning of that other system's identifiers.
The results will be ambiguous otherwise, and its not the other
system's fault, and its not because the definition of resource
is vague or tied to any one model.  The definition is in 2396
because of the very long and painful debate about URNs, in which
confusion of the scope of resources (e.g., assuming they were
machines or files) led to a huge waste of energy on pointless
debates far worse than this one.


Cheers,

Roy T. Fielding, Chief Scientist, Day Software
                  2 Corporate Plaza, Suite 150
                  Newport Beach, CA 92660-7929   fax:+1.949.644.5064
                  (roy.fielding@day.com) <http://www.day.com/>

                  Co-founder, The Apache Software Foundation
                  (fielding@apache.org)  <http://www.apache.org/>
Received on Thursday, 23 January 2003 18:52:05 UTC