- From: Frank Manola <fmanola@acm.org>
- Date: Mon, 24 Apr 2006 12:49:35 -0400
- To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
- CC: Pat Hayes <phayes@ihmc.us>, public-swbp-wg@w3.org, Guus Schreiber <guus@few.vu.nl>, Steve Pepper <pepper@ontopia.net>, Mark van Assem <mark@cs.vu.nl>, "Ralph R. Swick" <swick@w3.org>
Booth, David (HP Software - Boston) wrote: >> From: Frank Manola >>>> From: David Booth >>>>> From: Pat Hayes >>>>> >>>>> It might be best to start with a definition of what you >>>>> consider an >>>>> information resource to be. Since the TAG do not define this >>>>> critical term, yet base important engineering decisions >>>>> on it, any >>>>> authoritative exposition would be of immense value. My current >>>>> understanding is that an information resource is some >>>>> thing that can >>>>> be transmitted over a network by a transfer protocol. On this >>>>> understanding, one could argue that a word was an information >>>>> resource. >>>> Definitely not. That would be a "representation", not an >>>> "information resource". The information resource is the >>>> *source* of "representations" that can be transmitted >>>> over a network. >> Sorry to butt in, but a couple of minor comments: >> >> "Definitely not" may be technically correct, but I think a bit more >> context is needed here. The TAG Architecture document says: >> >> "It is conventional on the hypertext Web to describe Web >> pages, images, product catalogs, etc. as "resources". The >> distinguishing characteristic of these resources is that >> all of their essential characteristics can be >> conveyed in a message. We identify this set as "information >> resources." >> >> This document is an example of an information resource. It >> consists of words and punctuation symbols and graphics and other >> artifacts that can be encoded, with varying degrees of >> fidelity, into a sequence of bits. There is nothing about >> the essential information content of this >> document that cannot in principle be transfered in a message. >> In the case of this document, the message payload is the >> representation of this document." >> >> So, referring to the next sentence, it would seem that an RDF >> ontology and an HTML web page *are* information resources. >> What gets transmitted over the wire, however, would be >> representations of those information resources. Right? > > You're right. I should have been clearer that it depends on what you > mean by "RDF ontology" or "HTML web page". If you're referring to the > abstract document that may change over time then yes, it is an > information resource. If you're referring to a particular instantiation > of that document that may be transmitted over the wire then no, it is a > representation. Pat was > referring to something that could be transmitted over the wire. > > An information resource cannot be transmitted over the wire. It is an > abstraction. Thus, I believe the WebArch sentence above that says: > > "all of their essential characteristics can be conveyed > in a message" > > is slightly incorrect and should have said something like: > > "all of their *current* essential characteristics can be > conveyed in a message" > > because a representation only gives a snapshot of that information > resource at one particular moment, whereas the "information resource" is > the abstract source/set of those representations over time. > David-- What you say is correct, but I think that some of the qualifications about *current* characteristics and *information* resources could be misinterpreted. First off, if I understand this business properly, *no* resources, information or not, can be sent over a network or conveyed in messages. Only *representations* of resources can be sent or conveyed in this way. The distinction between information resources and other resources isn't about whether or not representations of them can be sent or conveyed (*only* representations of resources can be sent or conveyed, and non-information resources can have associated representations that can be sent or conveyed), but rather about whether or not those representations convey the "essential characteristics" of those resources. This separation of concepts serves a number of purposes. One of them is to deal with time-varying resources. However, it's not necessary for the resource to vary over time: a resource may be static, and the same separation of concepts applies. In the case of a static resource, what you'd get for a request is a snapshot, but the *same* snapshot. The separation is there to allow for the time-varying case, and for you to be able to coin separate URIs for the time-varying resource, and for particular "versions" (e.g., over time) of it. Another purpose is to distinguish the resource in the abstract from different representations of it that may be returned for different purposes. Examples of this are illustrated in http://www.w3.org/TR/swbp-vocab-pub/, where an RDF vocabulary is returned in either RDF/XML or HTML, depending on what the user wants. Here's where things can get further confused (or, at least, where *I* may be further confused). Take the case of an RDF vocabulary referenced by a single URI, say http://example.myvocab. However, "under the covers" there are really two documents available, http://example.myvocab.rdf and http://example.myvocab.html. A user may want either the rdf or the html version of the vocabulary, depending on what she/he is trying to do, and the discussion in http://www.w3.org/TR/swbp-vocab-pub/ shows how you can get the version you want if you ask simply for http://example.myvocab. Now, my understanding is that: a. There are *three* resources here, http://example.myvocab, http://example.myvocab.rdf, and http://example.myvocab.html. These are all resources in spite of the fact that http://example.myvocab is in some sense "more abstract" (less of a specific representation) than the other two. b. Even when the server selects one of the versions, either http://example.myvocab.rdf or http://example.myvocab.html to return, what gets returned is still a representation of one of these resources, not the resource itself. c. *All* of these are "information resources", in that their "essential characteristics can be conveyed in a message". That is, considered independently, the essential characteristics of http://example.myvocab.rdf can be conveyed in a message, the essential characteristics of http://example.myvocab.html can be conveyed in a message, and presumably the essential characteristics of http://example.myvocab can be conveyed in a message (although what actually gets sent is a representation of one of those other files). This is a place where I find the definition of "information resource" (put together with the httpRange-14 guidance) somewhat problematic, in that: d. whether a given representation conveys the "essential characteristics" of some resource is (necessarily) kind of fuzzy, and e. it seems as if the *server* (presumably acting as an intermediary for whoever put the resource out there in first place) has a lot more to say about whether the "essential characteristics" of the resource are being conveyed (via the return code it sends when you ask for the resource) than the user does, even though "essential for what" seems like an application-dependent decision. I (think I) understand at least some of the architectural tradeoffs involved, but I can't help but think that an HTTP response code is a pretty low-bandwidth mechanism for conveying this kind of information (in fact, I'm inclined toward Pat's position), and that we ought to be looking more at ways to use the RDF/OWL/... class of languages to provide metadata about: f. what kind of thing (or kinds of things) dereferencing a given URI is going to return, and g. what kinds of things a given kind of return might be useful for (e.g., users could document what they've been able to do with what's been returned, in an extensible fashion). --Frank
Received on Monday, 24 April 2006 16:45:17 UTC