Re: RDF from Tim Berners-Lee on 2003-01-29 (www-archive@w3.org from January 2003)

From: Tim Berners-Lee <timbl@w3.org>
Date: Wed, 29 Jan 2003 15:42:00 -0500
To: "Roy T. Fielding" <fielding@apache.org>
Cc: www-archive@w3.org, Mark Baker <distobj@acm.org>
Message-Id: <1D6B8DE2-33CA-11D7-B288-000393914268@w3.org>
On Wednesday, Jan 29, 2003, at 01:43 US/Eastern, Roy T. Fielding wrote:

>> Here is the test case to distinguish between
>> Roy's model and mine - and find whether they really are different.
>>
>> The only measurable property of an HTTP resource is
>> the set of representations one gets.
>
> That is certainly true for the WWW interface, but it is also true that
> a publisher has some semantics in mind when arranging for 
> representations
> to map to the URI, and since they have control it doesn't really matter
> whether there are any other measurable properties, aside from the fact
> that a resource isn't useful if its semantics cannot be observed.
>
> The problem with arguing from the point of view of observable semantics
> is that two different people can easily observe different semantics
> based on what they are interested in identifying.  In other words,
> one person may link to a URI because its representations usually 
> contain
> a picture of an object with an appealing shade of blue, while another
> person links to the URI because its representations contain a picture
> of a coastal bay that would be nice for skin diving, while the 
> publisher
> may be providing the URI for the sole reason of it being a satellite
> view of his house.  Which of those three people have the right to
> define the extent of variance in those representations?  I say that the
> first two are indirect identification and only the publisher can define
> what is direct identification.
>

I'd agree with you.  The system works to the extent that there
are shared expectations between the publisher and the person
quoting the URI.  In most cases on the web, these are pretty clear.
Like, when someone quotes a page on Amazon selling
a book. they are happy for it to have different ads and be a different
size on a different screen, but is it was suddenly about a different
book the system would have failed.

>> So let's take the test case of the car.
>>
>> Roy says its fine for one to get a picture of the car one
>> day and from the next day get a video tape or an audio track,
>> because they are all "representations" of the car.
>> Now I don't know how that "being a represnetation of the car"
>> actually constrains them.
>
> Only to the extent that it continues to reflect the current state
> of the resource as defined by the naming authority.  It would be
> very strange for a situation in which resource=car to result in a
> variety of representation types.

Oh.  I thought I got that example from you.

>  However, if resource="most recent
> media depiction of a BMW 530i", then it is quite likely that such a
> resource is going to be in a variety of formats over time and yet
> still be true to the resource.  That is useful even though an
> observer may have a hard time understanding the sameness, until
> they go to the links as defined on the authority's site and see
> what they are calling the link (or ask them for an RDF description).


I have no trouble with resource="most recent media depiction of a BMW 
530i",
as that identifies the intent of the communication well.


>> (Remember I use "representation" for the process of taking
>> a conceptual work and making bits of it, and Roy seems to use it
>> for the process of making a conceptual work which describes something,
>> and then making a digital tim:representation of that)
>
> Whether it is "made" or "selected" or is always a static set of bits
> is not something that my model cares about.  REST only thinks of it
> as data to be passed between components of an architecture interaction.
> What happens behind the interface is strictly hidden from view.
> Resources are identified, but representations, URI, and metadata are
> the only data elements that are used within the architecture.
> REST describes a system using the WWW interface.

I know.   I understand your model.  Those points are the same in mine.
  I was pointing out differences between yours and mine above.

>
>> Well I'm perfectly happy with content negotiation over
>> different content types, to the extent that the different
>> representations can be considered to some extent
>> the same message in a different medium.
>>
>> What are the bounds?
>>
>> Suppose one representation is a copy of the invoice for the car, and 
>> another
>> is a copy of the repair manual?
>>
>> Are they both roy:representations of the car and therefore is it 
>> reasonable
>> behavior for a server?
>
> No, neither describes the state of the car.  They do fall into the 
> category
> of "random things associated with the car", and I suppose someone could
> define a different URI for that purpose, but they cannot provide those
> as representations *and* claim the URI identifies the car.

I see. So when you talk about the resource being a car, then
representations must be descriptions of the the car.
That makes sense.  I would just say that the tim:Resource
is a description of the roy:Resource.

>
>> Suppose one representation is a legal document about the car,
>> and another representation was that legal document with "not"
>> inserted somewhere?
>
> Same answer.
>

Ok.

>> If you, Roy will accept that is bad practice -- and abuse of the 
>> system
>> not condoned by the specs, then we have a constraint that
>> all representations must be related in some way, besides simply
>> the subject.
>
> No, we have a constraint that says: observing the sameness of a 
> resource's
> representations is not sufficient to completely determine *the* 
> sameness
> that is intended for a resource, but at the same time a resource cannot
> have meaning that is inconsistent with the sameness described by its
> representations.

So documents should be self-describing, implicitly or explicitly.
"This is Roy's home page" for example. This is of course a good
way of  helping the common expectations about what it is.

> It therefore follows that a legal document is never
> going to be a valid representation of the car, even if it is 
> consistently
> provided over time.
>
>> The actual extent to which different representations differ is
>> of course controlled (As Roy noted earlier) by the publisher, but that
>> doesn't mean that there is no social pressure and real need for 
>> publishers
>> to be reasonable.
>
> True, but that is a social artifact that requires social pressure for
> conformance, rather than a constraint in the architecture.  It is,
> essentially, a corollary to the network-effect principle rather than
> a constraint (i.e., the more a resource's representation varies, the
> less useful it will be as a resource, particularly indirectly).
>

There are quite a lot of things in that space.
- One doesn't *have* to give something a URI
- One doesn't *have* to run a server
- One doesn't *have* to keep it available
- One doesn't *have* to keep some consistency of representation
but they all help.


>> So, if, Roy, you will accept that there is some such constraint
>> on the set of representations, then I can associate the
>> conceptual work in my model with that set of representations. Then we
>> just have a problem of nomenclature on our hands.  I think
>> we can make an architecture in which formally the thing identified
>> is the message, and in certain contexts would be used as
>> an indirect identifier to indicate the subject of the representations.
>
> Why?  If we define an architecture in which the URI always denotes
> one thing, and GET on that URI through the WWW interface always results
> in a representation of that one thing (or 404 not found), then the
> architecture conforms to the implementations of both the Web and the
> Semantic Web regardless of how individual publishers choose to define
> their own resources.

That's exactly what I am describing - with tim:representation.
You see how we could agree on an HTTP spec without resolving this?
By message I meant here "abstract resource", like
  "most recent media depiction of a BMW 530i", or
   "some description of my robot".


> I don't understand why you are insisting on this conceptual work view
> of http resources.  It doesn't seem to have any value for RDF because
> RDF cannot assume anything about what it receives as representations.
> If you disagree, please let me know what an RDF processor can assume
> from nothing more than observing GET results?

When an RDF processor does an HTTP GET on a URI
and parses it, then it gets a bunch of data back. Call it a formula.
Its a set of statements.

Example 1. A simple example, it finds a reference to weather.rdf#boston,
downloads it, and finds that is says 4 statements:
{
	#boston  a :City.
	 #boston has wea:temperature 12.
   	#nyc  a :City.
	#nyc has wea:tempterature 14 .
}
If it wants to tell anyone where it found the data, it refers to
<http:/.../weather.rdf>. No problem.  It might write the statement:

     #boston   :weatheravailableFrom <http://...weather.rdf>.

This means (say) that weather.rdf is some resource which you can
expect to give you the weather for boston.
Now the data might have actually been transferred in N3 or RDF/XML.
When I talking about the source of the data, the normal thing
is not to quote the representation, but the resource.
There is no point in content negotiation if one has to actually
refer to a representation.

Example 2.  Now suppose the resource acutally uses its name
as the name of a city. in <http://...weather/boston.rdf>,
there is {

	<boston.rdf> a :City.
	<boston.rdf> has :temperature 12.
}

This is what RDF people will do if we don't tell them not to.
The RDF concepts document tells them not to, but they quote
you Roy as saying no problem, an HTTP URI can identify a city.

Now the web agent does a HEAD request, and finds out

    <http://...weather/ boston.rdf>   Last-Modifed  "2003-01-01".
(Actually, the HTTP spec has in 7.1 the line
"Entity-header fields define metainformation about the entity-body or, 
of no body is present, the resource itself" which doesn't help us a  
lot.)

The same agent does a fetch against an annotation server and finds out

  <http://...weather/ boston.rdf>  wea:certified  org:WeatherUnderground.

It checks a Dublin Core repository and finds

<http://...weather/boston.com>  dc:creator   "Massachusetts Port 
Authority".

All these systems are giving information about the
resource as resource about Boston but not as the city of Boston.
That's the way URIs *are* used.
(Their use to identify cities is only starting now with RDF.
So we have to stop RDF people using it differently
to the rest of the web)
What, in your model, do you say
someone is refering to when they say:
  <http://...weather/boston.com>  dc:creator   "Massachusetts Port 
Authority"?
It isn't the specific representation in bits.
It isn't the city.

But now, if we believe that the <boston.rdf> is a city, and combine this
with some more public information about that city,
we find formally that boston was created by Massport, founded in 1722, 
has a length
was last modified on 1st Jan 2003, has a population of 678987 and
is a source of weather information  as certified by the weather 
underground.

That doesn't work.
The individual systems all work - you probably noticed
the OWL folks don't see the problem with it being a city
just as you haven't come across the contradiction from actually using 
the
URI for two things at the same time.

That is the main point.  I've responded to yours below, but the main 
point is above.

________________________________________________


> RDF depends on someone making assertions, and those assertions are
> associated with some form of a trust model, so there is no ambiguity
> as long as RDF doesn't assume that the URI and the result of a GET
> on that URI are the same thing (which, of course, it cannot do for
> the same reason RDF cannot do so for non-http URI schemes).

RDF does not assume that. You keep saying people muddle
representations with resources, but that indicates to me that you
weren't listening.  The thing the dublin core data is about
is a conceptual resource, but not the city.

> To go back to the car example, if I define a URI scheme "vin" that
> matches the manufacturer's vehicle identification number standard
> and then deploy a resolution mechanism for "vin" using the WWW
> interface, the GET vin:289814678... will consistently result in
> representations of cars that are always in the form of conceptual
> works, because that is what the WWW interface provides in response to
> GET on any URI.  Yet it would be completely unreasonable for the
> Semantic Web to claim that "vin" URIs identify the conceptual work
> and not the car, right?

When you define the scheme, you would say what you were defining,
if you do it properly.  If you say it defines the car, then it defines
the car.   You then talk about the "GET" as though every scheme
were HTTP, and I've said lots of times that some schemes match
and some don't.  If the vinp supports only REGISTER and
REPORTSTOLEN as methods on the car, then it won't match.
If it supports GET it isn't GET in the same way as HTTP - maybe
it means GET_DESCRIPTION or GET_REGISTRATION or something.
Anyway, the semantics of that method, and the relationship
of the returned information to the car, would hopefully be defined
in the spec.

> So, what exactly is the architectural difference between defining
> a new URI scheme that is owned by one naming authority and using
> the DNS decentralized naming mechanism to state that each authority
> defines the mapping of identifier to resource in the http scheme?
> I can define a one-to-one mapping from vin:* to http://vin.org/*
> and thus directly define any car via an unambiguous http URI.
> Saying that this is a bad thing discredits all of the arguments we
> have made over the past ten years regarding a specialized URN syntax
> being unnecessary, and makes a mockery of our position that
> inventing new schemes is harmful.

We are saying "the web is a web of information. if you
want to define something, define it is some language.
HTTP will allow you to public your definition and, thanks to the
hash mark, allow you to create URIs which reference the items
in the conceptual space of the new langauge.

> In short, I don't see anything gained from thinking that an http URI
> always identifies a conceptual work, and quite a lot of capability
> lost because of it.

The fact is that in common usage today, it does.

If one person assumes - as most people do - that the resource is the
conceptual information thing, then we can't have others saying it is
something incompatible.


> I understand that http URIs are, from the point of view of the
> WWW interface, identifiers of a name to be mapped by a listener into
> a conceptual work, since that is what the WWW interface does for any
> URI scheme.

Maybe (Mark is right and) you haven't got what I mean by "Conceptual 
Work" - I think you are confusing it with HTTP entity, or 
representation.

>  However, given that the WWW interface does not constrain
> the meaning of those other URI simply because they can be used through
> its interface, it makes far more sense to me that http URIs are equally
> unconstrained rather than claim that http URIs identify the interface
> itself.

"interface" isn't a term we have been using.

> That's why I say that I can write a document that describes the
> one Web (the Web of relationships among all resources) as being the
> same for both WWW and the Semantic Web, while at the same time
> describing the WWW interface as one system that makes use of the Web
> through representations and the Semantic Web as another system that
> makes use of the Web through logical assertions.

The semantic web uses representations - just representations in a
logical language not pictures or vernacular speech.

>  Because then I can
> describe all of the constraints on the WWW interface, along with the
> principles that motivated them, without constraining the behavior of
> the Semantic Web and Web Services.  I think that is what you really
> want, but I cannot get there while one system is making assumptions
> about the other system that are not supported by its behavior.

These assumptions are totally supported by behavior.
There is all kinds of metadata about web pages out there,
systems which assume that the representation they retretived is
that of some notional document which has various properties.
Even though "web page" didn't click with Tim Bray obviously,
its not only metadata systems its also the general user for whom
the generic thing you bookmark, not the specific representation, and 
not the city,
is what they are referring to and thinking about.


> ....Roy
Received on Wednesday, 29 January 2003 15:41:39 UTC