Re: resources and URIs from Tim Berners-Lee on 2003-07-21 (www-tag@w3.org from July 2003)

From: Tim Berners-Lee <timbl@w3.org>
Date: Mon, 21 Jul 2003 10:51:56 -0400
To: pat hayes <phayes@ihmc.us>
Cc: www-tag@w3.org, Roy Fielding <fielding@apache.org>
Message-Id: <DFF7871E-BB8A-11D7-AF3A-000393914268@w3.org>
> Tim:
>> Yes. Let us actually call these things Information Resources.  They  
>> are an important subclass of Resources.  You make a very good point,  
>> and I have asked for the Architecture document to be changed to  
>> reflect this.
>
> Pat:
> Ah, great!  OK, that would definitely be progress. Particularly as the  
> document could then be a little more, er, careful about things it says  
> about resources which actually apply only to information resources.

Yes. (This is not a consensus in the TAG at the moment, I should note.   
I have yet to convince Roy that the distinction is important.)

[...]
> OK, but now is it still correct to say that the representation is OF  
> the resource? That is, I understand that you want to say that a bare  
> http: URI be understood to *denote* the information resource, but the  
> representation retrieved by that URI - which might for example be  
> something about the weather in Oaxala - need not be a representation  
> OF the information resource, surely?
>

This depends on the way you use the word "representation".
My way of modeling this is that an information resources typically  
conveys in some way some
information, and a representation of it is an expression of that  
information.
Such an expression can involve a choice of natural and/or KR language,
and a choice of MIME type and data encoding to get it into bits.
Take a web page of the Khubla Khan.  Coleridge (sp?) chose English, and  
the
web page publisher chose HTML.
This is, I think most would agree, a representation. But of what?
It is an expression of a sentiment and a story about Khubla Khan's  
sacred
please-dome and other things.  I feel that the notion of what the poem
is "about" is not strong. It isn't one which the system relies on.
People in english refer to a picture of a car as a representation of  
the car,
and it is - in english.  So maybe it is better to use some other word,  
say expression

For me, the relationship between the picture and the pair of  
(image/jpeg, <bits>)


[ a ex:Car ]     <--  about ---  [ a picture ] -- has expression ->  [  
a :Representation]

The "has expression" step is what HTTP deals with (Roy might argue).  
This we have
a lot of machinery around, and so we need clean modeling.  The "about"  
bit we
don't have or need such precision. (What is "Khubla Khan" about anyway?)

> Or do you want to say that the thing I get by pinging the website is a  
> representation of the current state of the resource which *itself* is  
> a representation of the weather in Oaxala? So there are two levels of  
> indirection involved: what I see on my screen represents the (state of  
> the) information resource which represented (at the time I pinged it)  
> something about the weather in Oaxala? That does make a kind of sense,  
> but it seems needlessly complicated and semantically rather messy; it  
> blurs the meaning of 'representation of' in a rather unintuitive way,  
> and makes the wording of the document rather misleading in the way it  
> is currently written.

Exactly - the two levels.
It is good to get your feedback on this.  Maybe a change of vocabulary  
is what we need.

> So far this is clear; and the account of 'representation' given in the  
> document is also then reasonably clear:
>
> "Agents (such as servers, browsers and multimedia players) communicate  
> resource state through a non-exclusive set of data formats, used  
> separately or in combination (e.g., XHTML, CSS, PNG, XLink, RDF/XML,  
> SVG, SMIL animation). In the travel scenario, Dan's user agent uses  
> the URI to request a representation of the identified resource. In  
> this scenario, the representation consists of XHTML with embedded  
> weather maps in SVG. "
>
> On this picture, the information (which Dan, in your introductory  
> example, reads on his screen, and which is in some sense all about the  
> weather in Oaxaca) is a representation of the (current state of) some  
> entity *in the WWW itself*: a resource in the global information  
> network: the state of some computer system, or maybe some abstraction  
> of a computer system.
>
> However, it is also clear that neither the weather in Oaxala, nor  
> Oaxala itself, are entities of this kind:  weather and cities in  
> Mexico are not the kind of entities which can be thought of as  
> 'objects on the networked information system'. Other examples abound,  
> eg http://chandra.harvard.edu/photo/2003/ngc1068/index.html  is in  
> clearly about a galaxy containing a supermassive black hole, which is  
> also not something one would expect to find as part of an networked  
> information system, given the likely physical constraints on network  
> architecture.

Yes.  Absolutely.

>> Yes. In RDF, the "#" is like an operator which combines the  
>> identifier of an information resource with a local identifier used  
>> within that resource, and together forms an identifier for the  
>> abstract thing, like the weather.
>
> OK, I know this is the way that the SW formalisms have kind of  
> converged on, and it certainly makes sense (for HTML also, right? More  
> or less, anyway.  Eg  
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#dfn- 
> predicate  could be said to denote the concept of predicate in RDF,  
> and that would seem to capture the English meaning of the text in the  
> HTML.
> Its a little harder to say what it is that  
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/ 
> #section-Concepts denotes, though: it seems to be a textual kind of  
> thing itself. Oh well, lets not quibble...)
> .

With HTML you don't get denotation of anything formally - it isn't a  
global KR, semantic web, langauge.  Its for human perception and all  
you can do with a person is highlight a bit of text.  But that's ok -  
peope aren't formal inference engines.   So not to worry.  The  
awkwardness is when RDF refs to a fragment of an HTML document - how  
does our consistent global KR system what does foo.html#bar denote -  
the hypertext anchor or something it is about?  Or maybe one says that  
the semantic web "denotes" function just doesn't apply in some cases -  
HTML syntax doesn't render RDF.  The user interface "highlight this"  
function does.  Conversely, if you try "highlight this" on an RDF  
document, it doesn't work, because no one part of the document is  
associated with one RDF node.   Seems we don't really need to either of  
these things though.

> It seems that there is a systematic ambiguity between two senses of  
> 'resource' (or maybe two senses of 'representation') here.

Exactly.

>  In your first example, I doubt very much that Dan, when looking at  
> his screen after telling his browser to retrieve  
> http://weather.example.com/oaxaca, thinks of what he is reading as in  
> any sense about the state of something on the WW information network.

What he is reading *is* the state of something in the WW informaton  
network.
What he is reading is *about* the weather.

>  Certainly if I were in his shoes, I would be reading it as being  
> about Oaxala and weather: that is why he is reading it, presumably: to  
> find out something about the weather in Oaxala.  So what this  
> representation is *about* is not, apparently a resource:
>
> It is a resource - resource is like daml:Thing.

Yes.

Actually I think users are aware of the concepts "CNN weather for  
Oaxala"  and "Weather in Oaxala".
People are aware that they are reading something created by some one  
else, and the difference between differences between different pages on  
the weather in Oaxala.

[..............................]
>
> Speaking of which, is the thing with "#" in it actually a URI? The  
> architecture document speaks of URIs, not URI references. (I am  
> honestly confused about this distinction, not making a rhetorical  
> point.)

The whole filed has been confused and inconsistent, not your fault.
URIs originally excluded "#", and URIrefs added both the "#" and the  
relative forms, confusing two quite different things.   I would like to  
clean this up, and make a URI include the #, and a URIreference be the  
string in a document which may be a relative thing.  We are working on  
that.

{ ?x  :uri ?u.   ?u string:match "http://[^#]*" }  =>   { ?x rdf:type  
:HTTPInformationResource }.

  :HTTPInformationResource  rdfs:subClassOf :InformationResource.


Some HTTP resources are accessible (for a given agent, your milage may  
vary)
and when one has been accessed, there is a representation which has at  
least
a MIME type and a body (the bits).

{ ?x rdf:type :AccessedHTTPInformationResource.
   ?x  :Expression ?r }  =>
{  ?r  rdf:type  :Representation;
         http:contentType ?m;
         http:entityBody    ?b. }.

{ ?x rdf:type :HTTPInformationResource.
   ?x  :inSomeWayAbout ?s.
   ?x  :Expression ?r }  =>
{  ?r  :inSomeWayARepresentationOf   ?s. }.


>   So whatever you are talking about, and whatever they are talking  
> about, y'all cannot possibly be using the words "resource" and  
> "representation" in the same sense.

That is the goal.  It seems to be hard work, as people get quite  
attached to their different ways of using words.

> As a result, several of the assertions you make in this document are  
> not correct. For example
>
> 2.8.2
>
> "merging Semantic Web technologies, including "DAML+OIL" [ DAMLOIL ]  
> and "Web Ontology Language (OWL)" [ OWL10 ], define RDF properties  
> such as equivalentTo and FunctionalProperty to state -- or at least  
> claim -- formally that two URIs identify the same resource. "
>
> is incorrect. These assertions claim that two URI references *denote*  
> the same entity in all interpretations. That is not the same notion as  
> 'identify'.

Can you define "identify", if we are using "denote" for the thing a URI  
does?


> No?  How do they differ?  One seems to be expressed in MTspeak, the  
> other in normal software engineering speak.
>
>
> Well, there seems to be a presumption that "identify" means something  
> like "can be used to gain access to" in the way that the specs are  
> currently written, i.e. to mean something much stronger than merely  
> 'denotes'. Its hard to know exactly what it means, in fact, which is  
> why I have been asking for clarification.

Just "denotes".  Now whether we can change the whole document to use  
the term "denote" instead of "identify" will be an interesting  
question.  But that is how I have been using "identify".  I haven't in  
several emails seen anything which distinguishes that denoted by your  
"denotes" from that denoted by my "identifies", so I will behave as  
though they denote the same thing until further notice.

>>> In fact, there is no such notion as 'identify' in RDF/RDFS/OWL  
>>> semantics; and the first principle in section 2 ("All important  
>>> resources SHOULD be identified by a URI ") is meaningless when taken  
>>> literally in the context of semantic web languages,
>>
>> Which just shows that taking one statement literally in a foreign  
>> context does not preserve its meaning in natural language.  The arch  
>> doc is not written in the languages in which OWL is described.
>
> Well then it shouldn't use OWL examples to make its points, surely.

It certianly can.  Just because the arch doc does not use OWL  
vocabulary it can still talk about OWL.

[...]

>> No one said that there was a URI for every resource.  It is quite a  
>> different thing to say that *any* resource can have a URI (which is  
>> true) as to say that *every* resource has a URI.  The system does not  
>> require that every resource has a URI.
>
> OK, but please be clear about this. Several of the email discussions  
> seem to assume that something becomes a resource when it is given a  
> URI: that having a URI is somehow definitive or characteristic for  
> being a resource.

(That was a suggestion my Michael Mealing I believe, not one to which I  
adhere.     I couldn't write rules to talk about this distinction - the  
machinery would allocate a URI in considering the question.  Name one  
of the set of things you have never thought about.)

> Here's an intuition-sharpening question: does it make sense to speak  
> of a set of resources which is such that it is *impossible* to assign  
> a URI to everything in the set? (A: yes. :-)

Certianly.

>  If we can make progress on this "information resource" distinction we  
> will be in much better shape than we are at present.

Yes.

> Pat
>
> ---------------------------------------------------------------------
> IHMC      (850)434 8903 or (650)494 3973   home
> 40 South Alcaniz St.      (850)202 4416   office
> Pensacola              (850)202 4440   fax
> FL 32501                  (850)291 0667    cell
> phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 21 July 2003 17:55:50 UTC