RE: [httpRange-14] What is an Information Resource? from Booth, David (HP Software - Boston) on 2008-02-01 (www-tag@w3.org from February 2008)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Fri, 1 Feb 2008 19:10:22 +0000
To: Ian Davis <lists@iandavis.com>
CC: Ed Davies <edavies@nildram.co.uk>, "Sean B. Palmer" <sean@miscoranda.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <184112FE564ADF4F8F9C3FA01AE50009DED0EBAB7D@G1W0486.americas.hpqcorp.net>
> From: Ian Davis
> [ . . . ] it's less clear that
>
> ex:RdfGraph owl:disjointWith awww:InformationResource .
>
> or
>
> ex:XmlNamespace owl:disjointWith awww:InformationResource .
>
> And the everyday website operator needs to know those kinds
> of things to configure their web server. How do we help them?

By changing the (flawed) WebArch definition of awww:InformationResource to get away from talking about "essence" and "information", and instead talk about what an awww:InformationResource *does*.

If we restrict our attention to awww:Resources denoted by http URIs with no path component, such as http://ian.example/ , then it would be simple to define an awww:InformationResource as being merely a logical HTTP endpoint, and most everyday website operators should be able to understand what that is.  Note that there is no requirement that this logical HTTP endpoint actually *does* return a 200 OK response to any particular GET request.  After all, the server may be offline at the moment, or it may choose not to.  In fact, the code that implements that endpoint may not even have been written yet!  From a requester's perspective, that would be indistinguishable from it being simply offline.  So in that sense, an awww:InformationResource is really more like a *hypothetical* HTTP endpoint -- a logical HTTP endpoint whose existence we're hypothesizing.

Similarly, an awww:Representation could be defined as the content returned with a 200 OK response.  If the endpoint that produced the awww:Representation corresponds to a document on a file system, then the awww:Representation is a snapshot of that document at a particular time.  But as we all know, the endpoint could generate its output dynamically, using arbitrary algorithms (CGI scripts).  So in general we cannot say exactly how that awww:Representation corresponds to the awww:InformationResource that produced it -- the returned awww:Representation may or may not indicate the state of the endpoint.  All we really know is that the awww:Representation is whatever content the awww:InformationResource chose to return.

Those definitions work very well for the restricted case of HTTP URIs with no path component, but for the WebArch they still need to be generalized: (1) to cover awww:Resources that are denoted by URIs that *do* have a path component; and (2) to cover protocols other than HTTP.  This is where it gets fuzzier.  But if we start with the above model in mind, we can still keep a reasonably clear sense of what is intended.

To handle http URIs with path components, we could extend the definition by hypothesizing that *each* full URI (including the path component) could conceptually correspond to a distinct, finer-grained logical HTTP endpoint, and the GET request is (conceptually) sent to that finer-grained endpoint.  Thus, even though in reality we know that the HTTP request is sent to the server ian.example, conceptually we would say it is sent to the endpoint denoted by http://ian.example/dog , which may be a different logical HTTP endpoint from http://ian.example/ and http://ian.example/cat .  The reason I say it *may* be different is that there is no guarantee that they are different: the software could in fact be configured such that http://ian.example/dog and http://ian.example/cat denote the exact same logical endpoint.  This is similar to the fact that there is no guarantee that ian.example is a different server from david.example, even though their names are different.

That extended definition still works well, and still should be pretty understandable to an everyday website operator.  But it still isn't complete.

The second complication comes when we try to extend the definition of awww:InformationResource to cover arbitrary protocols, other than HTTP.  What is the analog to HTTP GET in other protocols?  Without knowing what protocol we're talking about, it's hard to say.  This is where the definition inherently gets fuzzier.  But if we keep the HTTP case in mind for guidance, all is not lost.  Some key points:

 - We're talking about conceptual endpoints in a protocol.  In some sense, what matters is not so much the absolute definitions of these terms, but the relationship between them: the fact that an awww:InformationResource can produce awww:Representations.

 - We're talking about things that are (potentially) on the network, i.e., things that could (conceptually) respond with content when a URI that denotes one is pasted into a browser.

Thus, the complete definition that I have proposed at
http://dbooth.org/2006/identity/#propdefir
for awww:InformationResource is:

        "a network source/sink of representations" . . . "conceptually,
        a function from time and requests to representations"

Of course, I do not speak for the TAG, and TAG has not (yet) adopted this definition.  But I hope the TAG will adopt something along these lines, as I really think it would help clarify the intent.



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent the official views of HP unless explicitly stated otherwise.
Received on Friday, 1 February 2008 19:11:31 UTC