Re: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Pat Hayes on 2006-04-25 (public-swbp-wg@w3.org from April 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 25 Apr 2006 14:07:19 -0500
To: Frank Manola <fmanola@acm.org>
Cc: public-swbp-wg@w3.org, Guus Schreiber <guus@few.vu.nl>, Steve Pepper <pepper@ontopia.net>, Mark van Assem <mark@cs.vu.nl>, "Ralph R. Swick" <swick@w3.org>
Message-Id: <p06230904c074181852a9@[10.100.0.24]>
>Booth, David (HP Software - Boston) wrote:
>>>From: Frank Manola
>>>>>From: David Booth
>>>>>>  From:  Pat Hayes
>>>>>>
>>>>>>  It might be best to start with a definition of what you 
>>>>>>consider an   information resource to be. Since the TAG do not 
>>>>>>define this  critical  term, yet base important engineering 
>>>>>>decisions  on it, any   authoritative exposition would be of 
>>>>>>immense value. My current   understanding is that an 
>>>>>>information resource is some  thing that can   be transmitted 
>>>>>>over a network by a transfer protocol. On this   understanding, 
>>>>>>one could argue that a word was an information   resource.
>>>>>Definitely not.  That would be a "representation", not an 
>>>>>"information resource".  The information resource is the 
>>>>>*source* of "representations" that can be transmitted over a 
>>>>>network.
>>>Sorry to butt in, but a couple of minor comments:
>>>
>>>"Definitely not" may be technically correct, but I think a bit 
>>>more context is needed here.  The TAG Architecture document says:
>>>
>>>"It is conventional on the hypertext Web to describe Web pages, 
>>>images, product catalogs, etc. as "resources". The distinguishing 
>>>characteristic of these resources is that all of their essential 
>>>characteristics can be conveyed in a message. We identify this set 
>>>as "information resources."
>>>
>>>This document is an example of an information resource. It 
>>>consists of words and punctuation symbols and graphics and other 
>>>artifacts that can be encoded, with varying degrees of fidelity, 
>>>into a sequence of bits. There is nothing about the essential 
>>>information content of this document that cannot in principle be 
>>>transfered in a message. In the case of this document, the message 
>>>payload is the representation of this document."
>>>
>>>So, referring to the next sentence, it would seem that an RDF 
>>>ontology and an HTML web page *are* information resources.  What 
>>>gets transmitted over the wire, however, would be representations 
>>>of those information resources.  Right?
>>
>>You're right.  I should have been clearer that it depends on what you
>>mean by "RDF ontology" or "HTML web page".  If you're referring to the
>>abstract document that may change over time then yes, it is an
>>information resource.  If you're referring to a particular instantiation
>>of that document that may be transmitted over the wire then no, it is a
>>representation.   Pat was
>>referring to something that could be transmitted over the wire.
>>
>>An information resource cannot be transmitted over the wire.  It is an
>>abstraction.  Thus, I believe the WebArch sentence above that says:
>>	"all of their essential characteristics can be conveyed	in a message"
>>is slightly incorrect and should have said something like:
>>	"all of their *current* essential characteristics can be 
>>	conveyed in a message"
>>
>>because a representation only gives a snapshot of that information
>>resource at one particular moment, whereas the "information resource" is
>>the abstract source/set of those representations over time.
>>
>
>David--
>
>What you say is correct, but I think that some of the qualifications 
>about *current* characteristics and *information* resources could be 
>misinterpreted.
>
>First off, if I understand this business properly, *no* resources, 
>information or not, can be sent over a network or conveyed in 
>messages. Only *representations* of resources can be sent or 
>conveyed in this way.

OK, but let me interject that 'represent' here has to be understood 
(very) differently in different cases. The relationship of a 
bitstream to an HTML web page (or the HTTP logical endpoint that 
encodes it in its state, or whatever) is a completely different kind 
of relationship from that which might hold between a bitstream and, 
say, me, or the planet Neptune, or the number 47. It seems likely 
that the only way to explicate the latter would be to introduce a 
notion like the former, since the only way a bitstring can represent 
in the second sense, represent-2, is to be an encoding of - to 
represent-1 - something with a syntax or representational structure, 
which itself represents (in yet another sense 3, that of describing 
or denoting or picturing) me or the planet or the number. So in the 
non-information-resource case, the representation has to be 
understood as multi-staged, and involving essentially the 
representation-1 of an information resource which itself represents-3 
the non-information resource: so *all* communication must involve a 
representation-1 of an information resource, even if that resource 
itself is, or contains, representations(-3) about something else. And 
being a representation of, as I believe I said earlier, is not in 
general transitive.

Note also that this whole discussion of time-varying resources is 
only appropriate for representation-1: non-information resources need 
not be time-varying, and even if they are, that variation need have 
no particular relationship to the 'transaction time' of any 
transmission of a representation of them; and representation-2 is not 
usually thought of as having anything particular to do with times. 
'47' represents-2 47, no snapshots are involved, and time has nothing 
to do with it.

>  The distinction between information resources and other resources 
>isn't about whether or not representations of them can be sent or 
>conveyed (*only* representations of resources can be sent or 
>conveyed, and non-information resources can have associated 
>representations that can be sent or conveyed), but rather about 
>whether or not those representations convey the "essential 
>characteristics" of those resources.
>
>This separation of concepts serves a number of purposes.  One of 
>them is to deal with time-varying resources.  However, it's not 
>necessary for the resource to vary over time:  a resource may be 
>static, and the same separation of concepts applies.  In the case of 
>a static resource, what you'd get for a request is a snapshot, but 
>the *same* snapshot.

Only for information resources. For non-information resources, the 
time of the request has absolutely nothing to do with anything. A 
picture of me doesn't change what it is a picture of, by virtue of 
when you look at it, and neither does a movie. Even when the entity 
is time-varying, the time of the viewing or accessing need not have 
anything to do with the temporal nature of the resource itself. This 
whole matter was sorted out by the temporal database community years 
ago, when they introduced the terminology of valid time versus 
transaction time, which might be useful here.

>  The separation is there to allow for the time-varying case, and for 
>you to be able to coin separate URIs for the time-varying resource, 
>and for particular "versions" (e.g., over time) of it.
>
>Another purpose is to distinguish the resource in the abstract from 
>different representations of it that may be returned for different 
>purposes.  Examples of this are illustrated in 
>http://www.w3.org/TR/swbp-vocab-pub/, where an RDF vocabulary is 
>returned in either RDF/XML or HTML, depending on what the user 
>wants. Here's where things can get further confused (or, at least, 
>where *I* may be further confused).

Me too.

>Take the case of an RDF vocabulary referenced by a single URI, say 
>http://example.myvocab.  However, "under the covers" there are 
>really two documents available, http://example.myvocab.rdf and 
>http://example.myvocab.html.  A user may want either the rdf or the 
>html version of the vocabulary, depending on what she/he is trying 
>to do, and the discussion in http://www.w3.org/TR/swbp-vocab-pub/ 
>shows how you can get the version you want if you ask simply for 
>http://example.myvocab. Now, my understanding is that:
>
>a.  There are *three* resources here, http://example.myvocab, 
>http://example.myvocab.rdf, and http://example.myvocab.html.  These 
>are all resources in spite of the fact that http://example.myvocab 
>is in some sense "more abstract" (less of a specific representation) 
>than the other two.

I have trouble here understanding what http://example.myvocab 
actually *is*. If it is a resource, what can possibly be a 
representation-1 of it? Its relationship to the 'real' html and rdf 
resources doesn't seem like that of a resource to representations-1 
of the resource, since the distinction is not one of state at a time, 
but is determined by the request; and representations of them are, 
well, of them, so not of it. So it seems to have no representations-1 
at all. (??)

>b.  Even when the server selects one of the versions, either 
>http://example.myvocab.rdf or http://example.myvocab.html to return, 
>what gets returned is still a representation of one of these 
>resources, not the resource itself.
>
>c.  *All* of these are "information resources", in that their 
>"essential characteristics can be conveyed in a message".  That is, 
>considered independently, the essential characteristics of 
>http://example.myvocab.rdf can be conveyed in a message, the 
>essential characteristics of http://example.myvocab.html can be 
>conveyed in a message, and presumably the essential characteristics 
>of http://example.myvocab can be conveyed in a message (although 
>what actually gets sent is a representation of one of those other 
>files).

Well, but that is the fatal objection. Neither of these would convey 
all the essential characteristics of http://example.myvocab, 
precisely because it can be either RDF or HTML, but neither of those 
can be the other. Unlike them, it is a chimera. Yet it has no other 
representations. Perhaps then it is not in fact an information 
resource at all? That would explain why 
http://www.w3.org/TR/swbp-vocab-pub/  feels it necessary to issue a 
303 re-direct to get from http://example.myvocab  to 
http://example.myvocab.rdf, rather than having http://example.myvocab 
just look at the request and send the appropriate notation back with 
a 2xx code; I have never understood before the rationale for this, 
but it does hang together with the WebArch doctrine, I have to admit.

>This is a place where I find the definition of "information 
>resource" (put together with the httpRange-14 guidance) somewhat 
>problematic, in that:
>
>d.  whether a given representation conveys the "essential 
>characteristics" of some resource is (necessarily) kind of fuzzy, and
>
>e.  it seems as if the *server* (presumably acting as an 
>intermediary for whoever put the resource out there in first place) 
>has a lot more to say about whether the "essential characteristics" 
>of the resource are being conveyed (via the return code it sends 
>when you ask for the resource) than the user does, even though 
>"essential for what" seems like an application-dependent decision.
>
>I (think I) understand at least some of the architectural tradeoffs 
>involved, but I can't help but think that an HTTP response code is a 
>pretty low-bandwidth mechanism for conveying this kind of 
>information (in fact, I'm inclined toward Pat's position), and that 
>we ought to be looking more at ways to use the RDF/OWL/... class of 
>languages to provide metadata about:
>
>f.  what kind of thing (or kinds of things) dereferencing a given 
>URI is going to return, and
>
>g.  what kinds of things a given kind of return might be useful for 
>(e.g., users could document what they've been able to do with what's 
>been returned, in an extensible fashion).

Amen to that.

Pat

>
>--Frank


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 25 April 2006 19:07:38 UTC