Re: resources and URIs from pat hayes on 2003-07-18 (www-tag@w3.org from July 2003)

From: pat hayes <phayes@ihmc.us>
Date: Fri, 18 Jul 2003 18:25:11 -0500
To: Tim Berners-Lee <timbl@w3.org>
Cc: www-tag@w3.org
Message-Id: <p06001231bb3e29265c75@[10.0.100.23]>
>On Tuesday, Jul 15, 2003, at 19:20 US/Eastern, pat hayes wrote:
>
>>Gentlemen, I would like to ask you to please clarify the meaning of 
>>the terms 'resource' and 'representation' in 
>>http://www.w3.org/TR/2003/WD-webarch-20030627/.
>>
>>Allow me to elaborate.  Your introductory example asserts the following:
>>
>>"Objects in the networked information system called resources are 
>>identified by Uniform Resource Identifiers ( URIs ). "
>>
>>and later the document says:
>>
>>"URIs identify resources. When a representation of one resource 
>>refers to another resource with a URI, a link is formed between the 
>>two resources. The networked information system is built of linked 
>>resources, and the large-scale effect is a shared information 
>>space. The value of the Web grows exponentially as a function of 
>>the number of linked resources (the "network effect").  "
>>
>>These, and other pieces of text concerning 'resources' published by 
>>other W3C authorities,  seem to clearly indicate that the word 
>>"resource" is intended to refer to the entities *in* the networked 
>>information system: they are the kind of thing we use words like 
>>'website', 'client' and 'server' to describe; they are things with 
>>a computational state, things with which one can communicate, 
>>things which send and receive information which can be transmitted 
>>along optical fibers and twisted pairs, things than can be linked 
>>to one another.
>>
>
>Yes. Let us actually call these things Information Resources.  They 
>are an important subclass of Resources.  You make a very good point, 
>and I have asked for the Architecture document to be changed to 
>reflect this.

Ah, great!  OK, that would definitely be progress. Particularly as 
the document could then be a little more, er, careful about things it 
says about resources which actually apply only to information 
resources.

>
>Specifically, the things addressed directly by http:  are all 
>information resources.

And ftp: and probably some others, right?

>This does *not* apply to other schemes.

Well, OK, but I would like to suggest that any URI scheme that claims 
to apply to anything beyond information resources needs to explain 
*how* the URIs in that scheme *get* their unique denotations. It 
could just be something like: "by our fiat: check with us if you are 
in doubt" , but it shouldn't be just left open.

>But HTTP resources are as you decribe and I have been trying to get that
>acknowledged in the arch doc.
>(The issue is HTTP range 14)

OK, but now is it still correct to say that the representation is OF 
the resource? That is, I understand that you want to say that a bare 
http: URI be understood to *denote* the information resource, but the 
representation retrieved by that URI - which might for example be 
something about the weather in Oaxala - need not be a representation 
OF the information resource, surely?

Or do you want to say that the thing I get by pinging the website is 
a representation of the current state of the resource which *itself* 
is a representation of the weather in Oaxala? So there are two levels 
of indirection involved: what I see on my screen represents the 
(state of the) information resource which represented (at the time I 
pinged it) something about the weather in Oaxala? That does make a 
kind of sense, but it seems needlessly complicated and semantically 
rather messy; it blurs the meaning of 'representation of' in a rather 
unintuitive way, and makes the wording of the document rather 
misleading in the way it is currently written.

>>So far this is clear; and the account of 'representation' given in 
>>the document is also then reasonably clear:
>>
>>"Agents (such as servers, browsers and multimedia players) 
>>communicate resource state through a non-exclusive set of data 
>>formats, used separately or in combination (e.g., XHTML, CSS, PNG, 
>>XLink, RDF/XML, SVG, SMIL animation). In the travel scenario, Dan's 
>>user agent uses the URI to request a representation of the 
>>identified resource. In this scenario, the representation consists 
>>of XHTML with embedded weather maps in SVG. "
>>
>>On this picture, the information (which Dan, in your introductory 
>>example, reads on his screen, and which is in some sense all about 
>>the weather in Oaxaca) is a representation of the (current state 
>>of) some entity *in the WWW itself*: a resource in the global 
>>information network: the state of some computer system, or maybe 
>>some abstraction of a computer system.
>>
>>However, it is also clear that neither the weather in Oaxala, nor 
>>Oaxala itself, are entities of this kind:  weather and cities in 
>>Mexico are not the kind of entities which can be thought of as 
>>'objects on the networked information system'. Other examples 
>>abound, eg http://chandra.harvard.edu/photo/2003/ngc1068/index.html 
>>is in clearly about a galaxy containing a supermassive black hole, 
>>which is also not something one would expect to find as part of an 
>>networked information system, given the likely physical constraints 
>>on network architecture.
>
>Yes. In RDF, the "#" is like an operator which combines the 
>identifier of an information resource with a local identifier used 
>within that resource, and together forms an identifier for the 
>abstract thing, like the weather.

OK, I know this is the way that the SW formalisms have kind of 
converged on, and it certainly makes sense (for HTML also, right? 
More or less, anyway.  Eg 
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#dfn-predicate 
could be said to denote the concept of predicate in RDF, and that 
would seem to capture the English meaning of the text in the HTML.
Its a little harder to say what it is that 
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Concepts 
denotes, though: it seems to be a textual kind of thing itself. Oh 
well, lets not quibble...)
.

>>It seems that there is a systematic ambiguity between two senses of 
>>'resource' (or maybe two senses of 'representation') here. In your 
>>first example, I doubt very much that Dan, when looking at his 
>>screen after telling his browser to retrieve 
>>http://weather.example.com/oaxaca, thinks of what he is reading as 
>>in any sense about the state of something on the WW information 
>>network. Certainly if I were in his shoes, I would be reading it as 
>>being about Oaxala and weather: that is why he is reading it, 
>>presumably: to find out something about the weather in Oaxala.  So 
>>what this representation is *about* is not, apparently a resource:
>
>It is a resource - resource is like daml:Thing.

Or rdf:Resource, in fact.  Right, this is Dan C's view as well.  I 
can live with this, although I can't really see the utility in 
inventing a new word for "thing" or "entity" when we have perfectly 
good words already. Sigh.

>The Arch doc confuses people at the moment by not introducing the 
>class of information resources.  I am told that if I can convince 
>Roy that this is a useful distinction, then we will probably be 
>done, and I thought I almost had, but then he says no.
>
>>  so it is not a representation of a resource, in the usual sense of 
>>'representation' and what is apparently your sense of 'resource'. 
>>Similarly, http://chandra.harvard.edu/photo/2003/ngc1068/index.html 
>>sure reads to me like it is about NGC 1068. But this means that 
>>either it is a 'representation' which is not about what it is 'of', 
>>or else that NGC 1086 is an 'object in the networked information 
>>system'; neither of which seem to me to be remotely plausible as 
>>factual claims using the ordinary senses of the words, and kind of 
>>brain-damaged as attempts at a formal definition of some kind of 
>>architectural/semantic theory.
>>
>>Now, this could be just a matter of philosophical opinion, were it 
>>not for the fact that semantic web languages like RDF and OWL have 
>>been given *formal* semantic theories which have direct 
>>architectural consequences for Web agents, and which depend 
>>crucially on notions like the term 'about' I have used rather 
>>loosely above.  RDF uses URI references as *names* to *refer* to 
>>entities. So if a web page such as 
>>http://chandra.harvard.edu/photo/2003/ngc1068/index.html were to 
>>include RDF markup, one might expect to find things like this in it:
>>
>><rdf:Description
>>rdf:about="http://chandra.harvard.edu/NGC/ngc1068"
>>rdf:type="http://chandra.harvard.edu/AOtype/Activegalaxy7"
>></rdf:Description>
>>
>
>No, that would be illegal by my way of thinking.

I agree that does not fit into the #/no-# style. I was responding to 
the wording of the existing architecture document.

>http://chandra.harvard.edu/NGC/ngc1068 is an information resource.
>You would expect
>
><rdf:Description
>rdf:about="http://chandra.harvard.edu/NGC#ngc1068"
>rdf:type="http://chandra.harvard.edu/AOtype/Activegalaxy7"
></rdf:Description>
>
>or, in the http://chandra.harvard.edu/NGC information ressource,,
>
><rdf:Description
>rdf:about="#ngc1068"
>rdf:type="http://chandra.harvard.edu/AOtype/Activegalaxy7"
></rdf:Description>
>
>where you can see that local identifiers can be used to refer to 
>abstract things, because that is what the RDF language spec says.
>
>>where the URIs refer respectively to a galaxy and an RDFS class of 
>>galaxy types. This is completely incompatible with what your 
>>document says about resources and representations.
>
>It is incompatible with your assumption that ALL resources are 
>information resources, just because there is so much talk of 
>information resources.
>That assumption is not made by the specs though.

Well, it is pretty hard to read the specs any other way as they 
stand. They say for example that resources are part of a network and 
that resources can be retrieved and have operations performed on them.

>
>>  Using the URI in this way does not create any kind of link between 
>>anything on this planet and NGC 1086 (which is, fortunately, about 
>>50 million light-years away).  But RDF/RDFS/DAML/OWL/OIL and all 
>>the other emerging Semantic Web formalisms *require* that URIs be 
>>used in this way, as *referring expressions*, not as informational 
>>links in a global architecture.
>
>The magic of the "#".
>Now you see why http://.../foo.rdf and http://.../foo.rdf#bar are 
>such different things.

Speaking of which, is the thing with "#" in it actually a URI? The 
architecture document speaks of URIs, not URI references. (I am 
honestly confused about this distinction, not making a rhetorical 
point.)

>
>>The RDF/RDFS/OWL semantics assumes that URI references refer to 
>>"resources" , but it explicitly denies that this word "resource" is 
>>limited to the kinds of resource that you seem to be talking about. 
>>On the SW view, *anything* is a resource: galaxies, regions of 
>>France, kinds of wine, sodium atoms, classes, mathematical 
>>abstractions, even fictional entities: anything that can be 
>>referred to by a name. None of these can possibly be "objects in a 
>>networked information system".
>
>Yes, "objects in the networked information system" are just another 
>subclass of Resource, from RDF's point of view.
>
>>   So whatever you are talking about, and whatever they are talking 
>>about, y'all cannot possibly be using the words "resource" and 
>>"representation" in the same sense.
>>
>>As a result, several of the assertions you make in this document 
>>are not correct. For example
>>
>>2.8.2
>>"merging Semantic Web technologies, including "DAML+OIL" [ DAMLOIL 
>>] and "Web Ontology Language (OWL)" [ OWL10 ], define RDF 
>>properties such as equivalentTo and FunctionalProperty to state -- 
>>or at least claim -- formally that two URIs identify the same 
>>resource. "
>>
>>is incorrect. These assertions claim that two URI references 
>>*denote* the same entity in all interpretations. That is not the 
>>same notion as 'identify'.
>
>No?  How do they differ?  One seems to be expressed in MTspeak, the 
>other in normal software engineering speak.

Well, there seems to be a presumption that "identify" means something 
like "can be used to gain access to" in the way that the specs are 
currently written, i.e. to mean something much stronger than merely 
'denotes'. Its hard to know exactly what it means, in fact, which is 
why I have been asking for clarification.

>
>>In fact, there is no such notion as 'identify' in RDF/RDFS/OWL 
>>semantics; and the first principle in section 2 ("All important 
>>resources SHOULD be identified by a URI ") is meaningless when 
>>taken literally in the context of semantic web languages,
>
>Which just shows that taking one statement literally in a foreign 
>context does not preserve its meaning in natural language.  The arch 
>doc is not written in the languages in which OWL is described.

Well then it shouldn't use OWL examples to make its points, surely.

>You can probably translate though.

No, I can't. If I could I wouldnt be making such a fuss.

>
>>as URIs there typically cannot be said to identify anything: they 
>>act as names whose possible referents are constrained by the 
>>assertions made using them, but they are not 'linked' to anything, 
>>not 'bound' to anything, and are not obliged to 'identify' anything;
>
>I say they identify things, you say they can't.  So I suppose you 
>have to show me what *you* mean by identify which breaks.

Well, I already did: they can't provide access to the thing, in 
general. They also can't generally be assumed to uniquely identify a 
particular thing, nor indeed is there any need to insist that they 
do; all of which does not apply, of course, if "identify" means 
'provide access to' in some sense.

>>and the universes of discourse may contain entities which cannot 
>>possibly be all identified or even referred to by URIs, since there 
>>are too many of them, or it is physically impossible to identify 
>>them with enough precision, or simply because it is impractical to 
>>do so.
>>
>
>No one said that there was a URI for every resource.  It is quite a 
>different thing to say that *any* resource can have a URI (which is 
>true) as to say that *every* resource has a URI.  The system does 
>not require that every resource has a URI.

OK, but please be clear about this. Several of the email discussions 
seem to assume that something becomes a resource when it is given a 
URI: that having a URI is somehow definitive or characteristic for 
being a resource.

Here's an intuition-sharpening question: does it make sense to speak 
of a set of resources which is such that it is *impossible* to assign 
a URI to everything in the set? (A: yes. :-)

>
>>------
>>
>>Sorry this comes across so negatively, but there seems to be a 
>>central misunderstanding right at the center of several 
>>architectural accounts of the Web, and I think it is important to 
>>get it sorted out.
>>
>
>Indeed, this distinction between information resources and Resources 
>in general has not had had an easy passage, but it is absolutely 
>necessary.  I hope we can get it into the arch doc. soon.
>
>Also, the fact that an identifier can be composed, using "#", of a 
>term used in a document combined with the global identifier for the 
>document in order to construct a global identifier for the thing 
>identified by the term, while so simple is subject to a lot of 
>criticism.   But it works and resolves these issues - even if 
>sometimes the document is imaginary.

There is a semantic issue arising from the very idea of names always 
being identifiers for a single thing, but let us leave that aside for 
now. If we can make progress on this "information resource" 
distinction we will be in much better shape than we are at present.

Pat



-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 18 July 2003 19:25:25 UTC