Re: RDF from Roy T. Fielding on 2003-01-29 (www-archive@w3.org from January 2003)

From: Roy T. Fielding <fielding@apache.org>
Date: Tue, 28 Jan 2003 22:43:07 -0800
To: Tim Berners-Lee <timbl@w3.org>
Cc: www-archive@w3.org, Mark Baker <distobj@acm.org>
Message-Id: <ECC18622-3354-11D7-BDD6-000393753936@apache.org>
> Here is the test case to distinguish between
> Roy's model and mine - and find whether they really are different.
>
> The only measurable property of an HTTP resource is
> the set of representations one gets.

That is certainly true for the WWW interface, but it is also true that
a publisher has some semantics in mind when arranging for 
representations
to map to the URI, and since they have control it doesn't really matter
whether there are any other measurable properties, aside from the fact
that a resource isn't useful if its semantics cannot be observed.

The problem with arguing from the point of view of observable semantics
is that two different people can easily observe different semantics
based on what they are interested in identifying.  In other words,
one person may link to a URI because its representations usually contain
a picture of an object with an appealing shade of blue, while another
person links to the URI because its representations contain a picture
of a coastal bay that would be nice for skin diving, while the publisher
may be providing the URI for the sole reason of it being a satellite
view of his house.  Which of those three people have the right to
define the extent of variance in those representations?  I say that the
first two are indirect identification and only the publisher can define
what is direct identification.

> So let's take the test case of the car.
>
> Roy says its fine for one to get a picture of the car one
> day and from the next day get a video tape or an audio track,
> because they are all "representations" of the car.
> Now I don't know how that "being a represnetation of the car"
> actually constrains them.

Only to the extent that it continues to reflect the current state
of the resource as defined by the naming authority.  It would be
very strange for a situation in which resource=car to result in a
variety of representation types.  However, if resource="most recent
media depiction of a BMW 530i", then it is quite likely that such a
resource is going to be in a variety of formats over time and yet
still be true to the resource.  That is useful even though an
observer may have a hard time understanding the sameness, until
they go to the links as defined on the authority's site and see
what they are calling the link (or ask them for an RDF description).

> (Remember I use "representation" for the process of taking
> a conceptual work and making bits of it, and Roy seems to use it
> for the process of making a conceptual work which describes something,
> and then making a digital tim:representation of that)

Whether it is "made" or "selected" or is always a static set of bits
is not something that my model cares about.  REST only thinks of it
as data to be passed between components of an architecture interaction.
What happens behind the interface is strictly hidden from view.
Resources are identified, but representations, URI, and metadata are
the only data elements that are used within the architecture.
REST describes a system using the WWW interface.

> Well I'm perfectly happy with content negotiation over
> different content types, to the extent that the different
> representations can be considered to some extent
> the same message in a different medium.
>
> What are the bounds?
>
> Suppose one representation is a copy of the invoice for the car, and 
> another
> is a copy of the repair manual?
>
> Are they both roy:representations of the car and therefore is it 
> reasonable
> behavior for a server?

No, neither describes the state of the car.  They do fall into the 
category
of "random things associated with the car", and I suppose someone could
define a different URI for that purpose, but they cannot provide those
as representations *and* claim the URI identifies the car.

> Suppose one representation is a legal document about the car,
> and another representation was that legal document with "not"
> inserted somewhere?

Same answer.

> If you, Roy will accept that is bad practice -- and abuse of the system
> not condoned by the specs, then we have a constraint that
> all representations must be related in some way, besides simply
> the subject.

No, we have a constraint that says: observing the sameness of a 
resource's
representations is not sufficient to completely determine *the* sameness
that is intended for a resource, but at the same time a resource cannot
have meaning that is inconsistent with the sameness described by its
representations.  It therefore follows that a legal document is never
going to be a valid representation of the car, even if it is 
consistently
provided over time.

> The actual extent to which different representations differ is
> of course controlled (As Roy noted earlier) by the publisher, but that
> doesn't mean that there is no social pressure and real need for 
> publishers
> to be reasonable.

True, but that is a social artifact that requires social pressure for
conformance, rather than a constraint in the architecture.  It is,
essentially, a corollary to the network-effect principle rather than
a constraint (i.e., the more a resource's representation varies, the
less useful it will be as a resource, particularly indirectly).

> So, if, Roy, you will accept that there is some such constraint
> on the set of representations, then I can associate the
> conceptual work in my model with that set of representations. Then we
> just have a problem of nomenclature on our hands.  I think
> we can make an architecture in which formally the thing identified
> is the message, and in certain contexts would be used as
> an indirect identifier to indicate the subject of the representations.

Why?  If we define an architecture in which the URI always denotes
one thing, and GET on that URI through the WWW interface always results
in a representation of that one thing (or 404 not found), then the
architecture conforms to the implementations of both the Web and the
Semantic Web regardless of how individual publishers choose to define
their own resources.

I don't understand why you are insisting on this conceptual work view
of http resources.  It doesn't seem to have any value for RDF because
RDF cannot assume anything about what it receives as representations.
If you disagree, please let me know what an RDF processor can assume
from nothing more than observing GET results?

RDF depends on someone making assertions, and those assertions are
associated with some form of a trust model, so there is no ambiguity
as long as RDF doesn't assume that the URI and the result of a GET
on that URI are the same thing (which, of course, it cannot do for
the same reason RDF cannot do so for non-http URI schemes).

To go back to the car example, if I define a URI scheme "vin" that
matches the manufacturer's vehicle identification number standard
and then deploy a resolution mechanism for "vin" using the WWW
interface, the GET vin:289814678... will consistently result in
representations of cars that are always in the form of conceptual
works, because that is what the WWW interface provides in response to
GET on any URI.  Yet it would be completely unreasonable for the
Semantic Web to claim that "vin" URIs identify the conceptual work
and not the car, right?

So, what exactly is the architectural difference between defining
a new URI scheme that is owned by one naming authority and using
the DNS decentralized naming mechanism to state that each authority
defines the mapping of identifier to resource in the http scheme?
I can define a one-to-one mapping from vin:* to http://vin.org/*
and thus directly define any car via an unambiguous http URI.
Saying that this is a bad thing discredits all of the arguments we
have made over the past ten years regarding a specialized URN syntax
being unnecessary, and makes a mockery of our position that
inventing new schemes is harmful.

In short, I don't see anything gained from thinking that an http URI
always identifies a conceptual work, and quite a lot of capability
lost because of it.

I understand that http URIs are, from the point of view of the
WWW interface, identifiers of a name to be mapped by a listener into
a conceptual work, since that is what the WWW interface does for any
URI scheme.  However, given that the WWW interface does not constrain
the meaning of those other URI simply because they can be used through
its interface, it makes far more sense to me that http URIs are equally
unconstrained rather than claim that http URIs identify the interface
itself.

That's why I say that I can write a document that describes the
one Web (the Web of relationships among all resources) as being the
same for both WWW and the Semantic Web, while at the same time
describing the WWW interface as one system that makes use of the Web
through representations and the Semantic Web as another system that
makes use of the Web through logical assertions.  Because then I can
describe all of the constraints on the WWW interface, along with the
principles that motivated them, without constraining the behavior of
the Semantic Web and Web Services.  I think that is what you really
want, but I cannot get there while one system is making assumptions
about the other system that are not supported by its behavior.

....Roy
Received on Wednesday, 29 January 2003 02:26:15 UTC