RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Pat Hayes on 2006-04-24 (public-swbp-wg@w3.org from April 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 24 Apr 2006 13:55:34 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: "Frank Manola" <fmanola@acm.org>, <public-swbp-wg@w3.org>, "Guus Schreiber" <guus@few.vu.nl>, "Steve Pepper" <pepper@ontopia.net>, "Mark van Assem" <mark@cs.vu.nl>, "Ralph R. Swick" <swick@w3.org>
Message-Id: <p06230907c072bf4cf925@[10.100.0.24]>
>  > From: Frank Manola
>>  >> From: David Booth
>>  >>>  From:  Pat Hayes
>>  >>>
>>  >>>  It might be best to start with a definition of what you
>>  >>>  consider an 
>>  >>>  information resource to be. Since the TAG do not define this
>>  >>>  critical  term, yet base important engineering decisions
>>  >>>  on it, any 
>>  >>>  authoritative exposition would be of immense value. My current 
>>  >>>  understanding is that an information resource is some
>>  >>>  thing that can 
>>  >>>  be transmitted over a network by a transfer protocol. On this 
>>  >>>  understanding, one could argue that a word was an information 
>>  >>>  resource.
>>  >>
>>  >> Definitely not.  That would be a "representation", not an
>>  >> "information resource".  The information resource is the
>>  >> *source* of "representations" that can be transmitted
>>  >> over a network.
>>
>>  Sorry to butt in, but a couple of minor comments:
>>
>>  "Definitely not" may be technically correct, but I think a bit more
>>  context is needed here.  The TAG Architecture document says:
>>
>>  "It is conventional on the hypertext Web to describe Web
>>  pages, images, product catalogs, etc. as "resources". The
>>  distinguishing characteristic of these resources is that
>>  all of their essential characteristics can be
>>  conveyed in a message. We identify this set as "information
>>  resources."
>>
>>  This document is an example of an information resource. It
>>  consists of words and punctuation symbols and graphics and other
>>  artifacts that can be encoded, with varying degrees of
>>  fidelity, into a sequence of bits. There is nothing about
>>  the essential information content of this
>>  document that cannot in principle be transfered in a message.
>>  In the case of this document, the message payload is the
>>  representation of this document."
>>
>>  So, referring to the next sentence, it would seem that an RDF
>>  ontology and an HTML web page *are* information resources. 
>>  What gets transmitted over the wire, however, would be
>>  representations of those information resources.  Right?
>
>You're right.  I should have been clearer that it depends on what you
>mean by "RDF ontology" or "HTML web page".  If you're referring to the
>abstract document that may change over time then yes, it is an
>information resource.  If you're referring to a particular instantiation
>of that document that may be transmitted over the wire then no, it is a
>representation.

By "an RDF ontology" I mean a particular set of 
RDF assertions. These can be thought of 
abstractly as an RDF graph, but they would likely 
be in the form of a document of some kind, 
typically RDF/XML. I do not consider either a 
graph or a document to be something that may 
change over time. Of course I understand that a 
resource may change over time, but then that 
simply raises the question we started with.

We can distinguish an 'abstract' document from a 
particular instantiation or rendering of the 
document, as when distinguishing the book 'Moby 
Dick' from a particular physical printing of Moby 
Dick. (There are a variety of more subtle 
distinctions possible here, between editions, 
printings, etc., which are of interest to book 
collectors and publishers.) This is usually 
indicated by distinguishing a token from a type. 
Only tokens can be transmitted, as you point out. 
Most of the time, for many purposes, it is not 
harmful to conflate types with tokens, however. 
For example, we say things like "I read Moby 
Dick", rather than "I read a printed token of 
Moby Dick". Similarly, it seems to me harmless to 
identify an RDF graph with any legal rendering of 
that graph into a sanctioned exchange notation, 
such as RDF/XML. But strictly, I guess we should 
say that a legal RDF/XML document is a token of 
the RDF graph it indicates.

It seems that you (and the TAG) use 
'representation' instead of 'token' here, but 
this is a very narrow and idiosyncratic usage of 
the word 'representation'.  And Im not sure that 
I have this exactly right, in any case, see below.

>  Pat was
>referring to something that could be transmitted over the wire.
>
>An information resource cannot be transmitted over the wire.  It is an
>abstraction.  Thus, I believe the WebArch sentence above that says:
>
>	"all of their essential characteristics can be conveyed
>	in a message"
>
>is slightly incorrect and should have said something like:
>
>	"all of their *current* essential characteristics can be
>	conveyed in a message"
>
>because a representation only gives a snapshot of that information
>resource at one particular moment, whereas the "information resource" is
>the abstract source/set of those representations over time.

OK, now I am completely puzzled. If the resource 
is an abstraction, then it not only cannot be 
transmitted, it cannot take part in any 
physically instantiated activity. In particular, 
it cannot transmit anything or be a source of any 
transmission. So it certainly cannot be an HTTP 
endpoint or be accessed by any kind of activity 
on any physical network, even the Internet. Is 
this really what the TAG wants to say is an 
information resource? On the other hand, if the 
resource is some kind of information-processing 
'point' or node in a network architecture, that 
can do things like send codes and transmit 
document tokens, then the relationship of these 
'representations' to it needs to be explicated, 
since we cannot then understand this as analogous 
to the type/token distinction. In particular, if 
these transmissions are representations, what are 
they representations OF? And where, in this 
picture, does the 'abstract document' play any 
role?

>  > >
>>  > Ah, I see. Thanks for that clarification. So for example an RDF
>>  > ontology and an HTML web page are not information resources,
>>  > either, I take it.
>>
>>  I also note that http://www.w3.org/TR/swbp-vocab-pub/ (under "URI
>>  Namespaces") says:
>>
>>  "For small vocabularies, it may be most convenient to serve
>>  the entire vocabulary in a single Web access. Such a vocabulary would
>>  typically use a hash namespace, and a Web access (i.e., an HTTP GET
>>  request) for any term in the vocabulary would return *a single
>>  information resource* describing all of the terms in the vocabulary."
>>  [my emphasis]
>>
>>  So this should be "would return *a representation of* a single
>>  information resource describing all of the terms in the vocabulary" ?
>
>Correct.  That should be fixed.

I would beg that the TAG give some kind of 
explanation of what they mean by 'representation' 
here. It is a deeply mysterious notion to me.

Pat Hayes

>
>David Booth


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 24 April 2006 18:55:50 UTC