RE: on documents and terms [was: RE: [WNET] new proposal WN URIs and related issues] from Booth, David (HP Software - Boston) on 2006-04-24 (public-swbp-wg@w3.org from April 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 24 Apr 2006 14:49:14 -0400
To: "Frank Manola" <fmanola@acm.org>
Cc: "Pat Hayes" <phayes@ihmc.us>, <public-swbp-wg@w3.org>, "Guus Schreiber" <guus@few.vu.nl>, "Steve Pepper" <pepper@ontopia.net>, "Mark van Assem" <mark@cs.vu.nl>, "Ralph R. Swick" <swick@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20B92F6@tayexc19.americas.cpqcorp.net>
Frank,

Excellent explanation!  Thanks for adding this clarification.

David Booth


> -----Original Message-----
> From: Frank Manola [mailto:fmanola@acm.org] 
> Sent: Monday, April 24, 2006 12:50 PM
> To: Booth, David (HP Software - Boston)
> Cc: Pat Hayes; public-swbp-wg@w3.org; Guus Schreiber; Steve 
> Pepper; Mark van Assem; Ralph R. Swick
> Subject: Re: on documents and terms [was: RE: [WNET] new 
> proposal WN URIs and related issues]
> 
> 
> Booth, David (HP Software - Boston) wrote:
> >> From: Frank Manola
> >>>> From: David Booth
> >>>>>  From:  Pat Hayes
> >>>>>
> >>>>>  It might be best to start with a definition of what you
> >>>>>  consider an  
> >>>>>  information resource to be. Since the TAG do not define this 
> >>>>>  critical  term, yet base important engineering decisions 
> >>>>>  on it, any  
> >>>>>  authoritative exposition would be of immense value. My 
> current  
> >>>>>  understanding is that an information resource is some 
> >>>>>  thing that can  
> >>>>>  be transmitted over a network by a transfer protocol. On this  
> >>>>>  understanding, one could argue that a word was an information  
> >>>>>  resource.
> >>>> Definitely not.  That would be a "representation", not an
> >>>> "information resource".  The information resource is the 
> >>>> *source* of "representations" that can be transmitted 
> >>>> over a network.
> >> Sorry to butt in, but a couple of minor comments:
> >>
> >> "Definitely not" may be technically correct, but I think a bit more
> >> context is needed here.  The TAG Architecture document says:
> >>
> >> "It is conventional on the hypertext Web to describe Web
> >> pages, images, product catalogs, etc. as "resources". The 
> >> distinguishing characteristic of these resources is that 
> >> all of their essential characteristics can be 
> >> conveyed in a message. We identify this set as "information 
> >> resources."
> >>
> >> This document is an example of an information resource. It
> >> consists of words and punctuation symbols and graphics and other 
> >> artifacts that can be encoded, with varying degrees of 
> >> fidelity, into a sequence of bits. There is nothing about 
> >> the essential information content of this 
> >> document that cannot in principle be transfered in a message. 
> >> In the case of this document, the message payload is the 
> >> representation of this document."
> >>
> >> So, referring to the next sentence, it would seem that an RDF
> >> ontology and an HTML web page *are* information resources.  
> >> What gets transmitted over the wire, however, would be 
> >> representations of those information resources.  Right?
> > 
> > You're right.  I should have been clearer that it depends 
> on what you 
> > mean by "RDF ontology" or "HTML web page".  If you're 
> referring to the 
> > abstract document that may change over time then yes, it is an 
> > information resource.  If you're referring to a particular 
> > instantiation of that document that may be transmitted over 
> the wire then no, it is a
> > representation.   Pat was
> > referring to something that could be transmitted over the wire.
> > 
> > An information resource cannot be transmitted over the 
> wire.  It is an 
> > abstraction.  Thus, I believe the WebArch sentence above that says:
> > 
> > 	"all of their essential characteristics can be conveyed 
> > 	in a message"
> > 
> > is slightly incorrect and should have said something like:
> > 
> > 	"all of their *current* essential characteristics can be 
> > 	conveyed in a message"
> > 
> > because a representation only gives a snapshot of that information 
> > resource at one particular moment, whereas the "information 
> resource" 
> > is the abstract source/set of those representations over time.
> > 
> 
> David--
> 
> What you say is correct, but I think that some of the qualifications 
> about *current* characteristics and *information* resources could be 
> misinterpreted.
> 
> First off, if I understand this business properly, *no* resources, 
> information or not, can be sent over a network or conveyed in 
> messages. 
> Only *representations* of resources can be sent or conveyed 
> in this way. 
>    The distinction between information resources and other resources 
> isn't about whether or not representations of them can be sent or 
> conveyed (*only* representations of resources can be sent or 
> conveyed, 
> and non-information resources can have associated 
> representations that 
> can be sent or conveyed), but rather about whether or not those 
> representations convey the "essential characteristics" of 
> those resources.
> 
> This separation of concepts serves a number of purposes.  One 
> of them is 
> to deal with time-varying resources.  However, it's not necessary for 
> the resource to vary over time:  a resource may be static, 
> and the same 
> separation of concepts applies.  In the case of a static 
> resource, what 
> you'd get for a request is a snapshot, but the *same* snapshot.  The 
> separation is there to allow for the time-varying case, and 
> for you to 
> be able to coin separate URIs for the time-varying resource, and for 
> particular "versions" (e.g., over time) of it.
> 
> Another purpose is to distinguish the resource in the abstract from 
> different representations of it that may be returned for different 
> purposes.  Examples of this are illustrated in 
> http://www.w3.org/TR/swbp-vocab-pub/, where an RDF vocabulary is 
> returned in either RDF/XML or HTML, depending on what the user wants. 
> Here's where things can get further confused (or, at least, where *I* 
> may be further confused).
> 
> Take the case of an RDF vocabulary referenced by a single URI, say 
> http://example.myvocab.  However, "under the covers" there are really 
> two documents available, http://example.myvocab.rdf and 
> http://example.myvocab.html.  A user may want either the rdf 
> or the html 
> version of the vocabulary, depending on what she/he is trying 
> to do, and 
> the discussion in http://www.w3.org/TR/swbp-vocab-pub/ shows 
> how you can 
> get the version you want if you ask simply for 
> http://example.myvocab. 
> Now, my understanding is that:
> 
> a.  There are *three* resources here, http://example.myvocab, 
> http://example.myvocab.rdf, and http://example.myvocab.html.  
> These are 
> all resources in spite of the fact that http://example.myvocab is in 
> some sense "more abstract" (less of a specific 
> representation) than the 
> other two.
> 
> b.  Even when the server selects one of the versions, either 
> http://example.myvocab.rdf or http://example.myvocab.html to return, 
> what gets returned is still a representation of one of these 
> resources, 
> not the resource itself.
> 
> c.  *All* of these are "information resources", in that their 
> "essential 
> characteristics can be conveyed in a message".  That is, considered 
> independently, the essential characteristics of 
> http://example.myvocab.rdf can be conveyed in a message, the 
> essential 
> characteristics of http://example.myvocab.html can be conveyed in a 
> message, and presumably the essential characteristics of 
> http://example.myvocab can be conveyed in a message (although what 
> actually gets sent is a representation of one of those other files).
> 
> This is a place where I find the definition of "information resource" 
> (put together with the httpRange-14 guidance) somewhat 
> problematic, in that:
> 
> d.  whether a given representation conveys the "essential 
> characteristics" of some resource is (necessarily) kind of fuzzy, and
> 
> e.  it seems as if the *server* (presumably acting as an intermediary 
> for whoever put the resource out there in first place) has a 
> lot more to 
> say about whether the "essential characteristics" of the resource are 
> being conveyed (via the return code it sends when you ask for the 
> resource) than the user does, even though "essential for what" seems 
> like an application-dependent decision.
> 
> I (think I) understand at least some of the architectural tradeoffs 
> involved, but I can't help but think that an HTTP response code is a 
> pretty low-bandwidth mechanism for conveying this kind of information 
> (in fact, I'm inclined toward Pat's position), and that we 
> ought to be 
> looking more at ways to use the RDF/OWL/... class of languages to 
> provide metadata about:
> 
> f.  what kind of thing (or kinds of things) dereferencing a 
> given URI is 
> going to return, and
> 
> g.  what kinds of things a given kind of return might be useful for 
> (e.g., users could document what they've been able to do with what's 
> been returned, in an extensible fashion).
> 
> --Frank
>
Received on Monday, 24 April 2006 18:49:41 UTC