Resources and representations (was: Re: Subgroup to handle semantics of HTTP etc?) from Richard Cyganiak on 2007-10-22 (www-tag@w3.org from October 2007)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 23 Oct 2007 01:10:33 +0200
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: Xiaoshu Wang <wangxiao@musc.edu>, W3C-TAG Group WG <www-tag@w3.org>
Message-Id: <77D3FA04-1DB5-4D8F-A75A-92D3968C3145@cyganiak.de>
Alan,

On 22 Oct 2007, at 22:22, Alan Ruttenberg wrote:
> On Oct 22, 2007, at 12:53 PM, Richard Cyganiak wrote:
>> The point is that it partitions resources into two kinds: message- 
>> conveyable resources, and other resources. And it establishes an  
>> axiom, that a 200 response means we have a message-conveyable  
>> resource.
>
> Yes, but we don't convey the resource. We only convey a  
> representation. To rephrase what you have said, my interpretation  
> is that a 200 response means that we have denoted a resource,  
> representations of which can be conveyed. However, there is nothing  
> that I see that would indicate that the message receiver is   
> supposed to be able to reconstitute the "resource" from the message  
> that is received.

A representation is thought to encapsulate the current state of the  
resource, as presented by the URI owner, and thus communicates the  
state to the receiver. Subject to some possible loss of fidelity, if  
the URI owner choses to publish in lower-quality formats.

(That's why REST is called REST -- REpresentational State Transfer.)

Note: If we can convey the current state of a resource, then by  
definition the resource is conveyable. (Though we'd need the ability  
of time travel to reconstitute it completely.)

> In fact, I would say we'd be pretty lucky in many cases if we are  
> able to infer what resource is being denoted,

Finding out what is being identified just by GETting representations  
is impossible, mostly because we can't look into the future. The only  
way to find out what is being denoted is to interpret some statement  
-- in logical or natural language -- made by the URI owner. (A very  
good place for such a statement would be in a representation  
retrieved from the URI.)

> mostly because of the uncertainty around what a representation  
> actually is (other than a tuple of a series of bits and a mime type)
>
>> therefore the URI identifies something message-conveyable,  
>> therefore it cannot be a person, therefore it must identify just  
>> the document.
>
> Here is where the logical fails. There are many things that this  
> can be if not the person. I'll leave this to your imagination, but  
> do challenge me if you can't see more than the one choice once  
> prompted.

I have seen people in the wild assume it's OK to use homepage URIs to  
identify homepages.

I have seen people in the wild assume it's OK to use homepage URIs to  
identify people.

I have *not* seen people in the wild assume it's OK to use homepage  
URIs to identify HTTP endpoints or bit sequences or 25-character  
strings starting with "http://" or bits of pocket fluff. That kind of  
thing is usually postulated by people with a keen interest in Web  
architecture, a vivid imagination, and too much time on their hands.

httpRange-14 removes a conflict between two reasonable yet  
incompatible interpretations that do occur in the real world. I don't  
care too much if it has funny side effects on possible  
interpretations that my or your bored imagination can potentially  
dream up.

>> The value of httpRange-14, in my eyes, is simply this: It affirms  
>> that web page URIs still identify web pages, even in the Semantic  
>> Web.
>
> Web pages "the generic". This says that the URI identifies  
> something that could have a representation which is html, or jpeg,  
> or svg. But there is still a desire to be able to identify each of  
> these individually. It matters, for instance, who you hire to do an  
> update to each of these - they require different tools and skills  
> to change.

Just one thing on the side: In the real world, very few web page URIs  
are subject to content negotiation.

> It seems to me that if you want to be able to denote these with a  
> URI, you are forced to accept that the appropriate response for a  
> web server is to respond 303.

I fail to see why that would be.

> Remember the test I proposed, that you seems to agree to? If it's  
> an information resource, you can't get a checksum of it. If you can  
> checksum it, it can't be an information resource.

No, I don't agree. Remember, I said an RDF/XML file published on the  
Web is an information resource, and a file can certainly be  
checksummed. Anything that exists (even if only in your imagination!)  
can be a resource.

> Each of the three items I have suggested I want to denote have can  
> be checksummed.

You can denote a resource that is defined as "the resource having  
only one particular representation, which has a checksum of 1234."

>> I like the "Halpin Test" [1]:
>>
>> "I would say that if there is a URI that is used to identify a  
>> resource one would want to make logical statements about, and  
>> these statements do not apply to possible representations of that  
>> resource, then one should use the "hash" or 303 redirection to  
>> separate these URIs."
>>
>> To me, that's good enough as an every-day sniff test.
>
> A good test, indeed. But suppose I am looking from the outside and  
> want to make a statement about such resources. In other discussions  
> we've concluded that pretty anything goes as far as what the  
> possible representations can be. How am I, not the owner, able to  
> figure out what the possible resources are?

If the URI owner wants you to know, he will tell you, perhaps  
somewhere in the representation. If he doesn't want you to know, then  
you'll have to use your imagination.

> And what happens when I want to say more creative things than the  
> owner thought of, things that do not apply equally to all the  
> representations that she serves?

If the owner didn't provide a URI for it, and you still want to talk  
about it, then you'll have to mint your own URI, and describe it to a  
level of accuracy that allows your audience to figure out what you  
are talking about.

> An how can I, as resource owner, decide that I want to mint a URI  
> to denote things that some might call representations? How am I to  
> do that.

You mint a URI and declare it to identify "the resource that has only  
one single fixed representation XYZ".

> Take, as an example,  the zip files on http://mirror.nyi.net/apache/ 
> lucene/java/ which has the following instructions.
>
> It seems to me that it's pretty hard to argue that http:// 
> mirror.nyi.net/apache/lucene/java/lucene-2.2.0-src.zip isn't  
> intended to denote something that can be checksummed, hence not a  
> resource.

It identifies a zip file. That's a resource. With one particular  
single fixed representation. Zip files can have checksums. Luckily,  
the checksum of the file also applies to the single fixed  
representation. Hence the resource passes the Halpin Test and is an  
information resource.

> -Alan
>
> ps. Looking forward to having a drink together wednesday :)

I'll be jetlagged, boozed up, and bickering with you about web  
architecture. Gonna be fun :-)

Richard


>> Signatures
>>
>> Many of the files have been digitally signed using GnuPG. If so,  
>> there will be an accompanying file.asc signature file in the same  
>> directory as the file (binaries/ or source/). The signing keys can  
>> be found in the distribution directory at <http://www.apache.org/ 
>> dist/lucene/java/KEYS>.
>>
>> Always download the KEYS file directly from the Apache site, never  
>> from a mirror site.
>>
>> Always test available signatures, e.g.,
>> $ pgpk -a KEYS
>> $ pgpk lucene-1.4.tar.gz.asc
>> or,
>> $ pgp -ka KEYS
>> $ pgp lucence-1.4.tar.gz.asc
>> or,
>> $ gpg --import KEYS
>> $ gpg --verify lucene-1.4.tar.gz.asc
>
>
>
>
>
Received on Monday, 22 October 2007 23:10:49 UTC