Re: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues]

Booth, David (HP Software - Boston) wrote:
>> From: Frank Manola
>>>> From: David Booth
>>>>>  From:  Pat Hayes
>>>>>
>>>>>  It might be best to start with a definition of what you 
>>>>>  consider an  
>>>>>  information resource to be. Since the TAG do not define this 
>>>>>  critical  term, yet base important engineering decisions 
>>>>>  on it, any  
>>>>>  authoritative exposition would be of immense value. My current  
>>>>>  understanding is that an information resource is some 
>>>>>  thing that can  
>>>>>  be transmitted over a network by a transfer protocol. On this  
>>>>>  understanding, one could argue that a word was an information  
>>>>>  resource.
>>>> Definitely not.  That would be a "representation", not an 
>>>> "information resource".  The information resource is the 
>>>> *source* of "representations" that can be transmitted 
>>>> over a network.
>> Sorry to butt in, but a couple of minor comments:
>>
>> "Definitely not" may be technically correct, but I think a bit more 
>> context is needed here.  The TAG Architecture document says:
>>
>> "It is conventional on the hypertext Web to describe Web 
>> pages, images, product catalogs, etc. as "resources". The 
>> distinguishing characteristic of these resources is that 
>> all of their essential characteristics can be 
>> conveyed in a message. We identify this set as "information 
>> resources."
>>
>> This document is an example of an information resource. It 
>> consists of words and punctuation symbols and graphics and other 
>> artifacts that can be encoded, with varying degrees of 
>> fidelity, into a sequence of bits. There is nothing about 
>> the essential information content of this 
>> document that cannot in principle be transfered in a message. 
>> In the case of this document, the message payload is the 
>> representation of this document."
>>
>> So, referring to the next sentence, it would seem that an RDF 
>> ontology and an HTML web page *are* information resources.  
>> What gets transmitted over the wire, however, would be 
>> representations of those information resources.  Right?
> 
> You're right.  I should have been clearer that it depends on what you
> mean by "RDF ontology" or "HTML web page".  If you're referring to the
> abstract document that may change over time then yes, it is an
> information resource.  If you're referring to a particular instantiation
> of that document that may be transmitted over the wire then no, it is a
> representation.   Pat was
> referring to something that could be transmitted over the wire.
> 
> An information resource cannot be transmitted over the wire.  It is an
> abstraction.  Thus, I believe the WebArch sentence above that says: 
> 
> 	"all of their essential characteristics can be conveyed 
> 	in a message" 
> 
> is slightly incorrect and should have said something like: 
> 
> 	"all of their *current* essential characteristics can be 
> 	conveyed in a message"
> 
> because a representation only gives a snapshot of that information
> resource at one particular moment, whereas the "information resource" is
> the abstract source/set of those representations over time.
> 

David--

What you say is correct, but I think that some of the qualifications 
about *current* characteristics and *information* resources could be 
misinterpreted.

First off, if I understand this business properly, *no* resources, 
information or not, can be sent over a network or conveyed in messages. 
Only *representations* of resources can be sent or conveyed in this way. 
   The distinction between information resources and other resources 
isn't about whether or not representations of them can be sent or 
conveyed (*only* representations of resources can be sent or conveyed, 
and non-information resources can have associated representations that 
can be sent or conveyed), but rather about whether or not those 
representations convey the "essential characteristics" of those resources.

This separation of concepts serves a number of purposes.  One of them is 
to deal with time-varying resources.  However, it's not necessary for 
the resource to vary over time:  a resource may be static, and the same 
separation of concepts applies.  In the case of a static resource, what 
you'd get for a request is a snapshot, but the *same* snapshot.  The 
separation is there to allow for the time-varying case, and for you to 
be able to coin separate URIs for the time-varying resource, and for 
particular "versions" (e.g., over time) of it.

Another purpose is to distinguish the resource in the abstract from 
different representations of it that may be returned for different 
purposes.  Examples of this are illustrated in 
http://www.w3.org/TR/swbp-vocab-pub/, where an RDF vocabulary is 
returned in either RDF/XML or HTML, depending on what the user wants. 
Here's where things can get further confused (or, at least, where *I* 
may be further confused).

Take the case of an RDF vocabulary referenced by a single URI, say 
http://example.myvocab.  However, "under the covers" there are really 
two documents available, http://example.myvocab.rdf and 
http://example.myvocab.html.  A user may want either the rdf or the html 
version of the vocabulary, depending on what she/he is trying to do, and 
the discussion in http://www.w3.org/TR/swbp-vocab-pub/ shows how you can 
get the version you want if you ask simply for http://example.myvocab. 
Now, my understanding is that:

a.  There are *three* resources here, http://example.myvocab, 
http://example.myvocab.rdf, and http://example.myvocab.html.  These are 
all resources in spite of the fact that http://example.myvocab is in 
some sense "more abstract" (less of a specific representation) than the 
other two.

b.  Even when the server selects one of the versions, either 
http://example.myvocab.rdf or http://example.myvocab.html to return, 
what gets returned is still a representation of one of these resources, 
not the resource itself.

c.  *All* of these are "information resources", in that their "essential 
characteristics can be conveyed in a message".  That is, considered 
independently, the essential characteristics of 
http://example.myvocab.rdf can be conveyed in a message, the essential 
characteristics of http://example.myvocab.html can be conveyed in a 
message, and presumably the essential characteristics of 
http://example.myvocab can be conveyed in a message (although what 
actually gets sent is a representation of one of those other files).

This is a place where I find the definition of "information resource" 
(put together with the httpRange-14 guidance) somewhat problematic, in that:

d.  whether a given representation conveys the "essential 
characteristics" of some resource is (necessarily) kind of fuzzy, and

e.  it seems as if the *server* (presumably acting as an intermediary 
for whoever put the resource out there in first place) has a lot more to 
say about whether the "essential characteristics" of the resource are 
being conveyed (via the return code it sends when you ask for the 
resource) than the user does, even though "essential for what" seems 
like an application-dependent decision.

I (think I) understand at least some of the architectural tradeoffs 
involved, but I can't help but think that an HTTP response code is a 
pretty low-bandwidth mechanism for conveying this kind of information 
(in fact, I'm inclined toward Pat's position), and that we ought to be 
looking more at ways to use the RDF/OWL/... class of languages to 
provide metadata about:

f.  what kind of thing (or kinds of things) dereferencing a given URI is 
going to return, and

g.  what kinds of things a given kind of return might be useful for 
(e.g., users could document what they've been able to do with what's 
been returned, in an extensible fashion).

--Frank

Received on Monday, 24 April 2006 16:45:17 UTC