Re: Question about the On Linking Alternative Representations TAG Finding

Xiaoshu,

On 8 Aug 2008, at 13:20, Xiaoshu Wang wrote:
>> A resource is anything named by a URI.
>>
>> A representation is a <bitstream, MIME type> tuple.
> It sounds simple but not really.  For instance, is "http://dfdf.inesc-id.pt/tr/doc/web-arch/img/fig2 
> " a resource or a representation?

It's a URI, a string starting with "http://". You probably meant to  
ask what it *identifies*. Well, it evidently identifies a resource,  
because that's what URIs do. I don't know what that resource is  
supposed to be, so I cannot tell if it is a representation or not.  
Only the person who minted the URI knows, and has not chosen to tell  
us explicitly.

> A *representation*, in my opinion, is what is delivered to the client,

Too simple. A representation is a <bitstream, MIME type> tuple.  
Representations are sometimes delivered to clients, sometimes sent  
from the client to the server, sometimes stored in a cache and so on.  
It's the real bits going over the wire.

> a *resource* is whatever the provider intends it to be.

Yes.

> They are always different - not in the sense if they are *bitstream*  
> or not.

They usually are, but not by definition. A URI can name anything. So I  
can name a particular <bitstream, MIME type> tuple with a URI, and  
hence the URI identifies a resource that is a representation. (I'm not  
saying that this is a particular useful thing to do. In fact, don't do  
it.)

> A *representation* of a resource denoted by a URI doesn't have to be  
> delivered electronically.  This is the often wrongly conceived idea  
> that http-URI must be bound to HTTP protocol.  If I bind a URI to a  
> postal service as its transportation protocol, you (in principle)  
> can go to a postal office to request "http://dfdf.inesc-id.pt/tr/doc/web-arch/img/fig2 
> " and a hardcopy print, which is NOT a bitstream, can be delivered  
> to you.  Do you consider that picture a resource or a representation?

The hardcopy picture is a rendering of the <bitstream, MIME type>  
tuple on paper. The hardcopy picture is neither a resource (unless  
someone assigns a URI to it, but again, let's not go there), nor is it  
a representation (because it's not a <bitstream, MIME type> tuple, but  
a piece of paper with ink on it).

This is irrelevant to Web architecture, by the way. The Web is an  
abstract machine that, among other things, emits representations  
(<bitstream, MIME type> tuples) in response to GET requests. What we  
do with the bits afterwards -- render them on a screen, print them on  
paper, store them on a disk -- is outside the scope of Web  
architecture and I don't see why we should talk about this.

>> That's a simple and objective distinction. The interesting and  
>> subjective question is how to best model an application using those  
>> two modelling primitives.
>>
>> There are two schools of thought on this. One school maintains a  
>> distinction between “documents” and “things described in the  
>> documents” in their modelling; the other school says that this  
>> distinction is unnecessary.
>>
>> The former modelling has been elevated to an axiom of Web  
>> architecture by the httpRange-14 decision. It has many advantages  
>> over the latter (cleaner handling of metadata, enables grouping of  
>> many descriptions in a single document, ...), and it has some  
>> disadvantages (more complex, ...). These have been discussed  
>> endlessly and I have no interest in resurrecting that debate.
>>
>>> There are only two choices.  (1) As T.V Raman said, don't make any  
>>> distinction between them.
>>
>> The distinction is made in the specs; and it's made by a large and  
>> significant part of the Web community (see REST). Raman might not  
>> consider it important, and that worries me a bit, but it doesn't  
>> diminish the importance of the distinction.
>>
>>> (2) As I have always proposed, to make an absolute distinction.   
>>> That is: to think every URI denotes a *resource*
>>
>> No one disagrees with this.
>>
>>> and what is dereferenced from the URI is the *representation* of  
>>> that resource.
>>
>> Not quite. A representation is what you get back when you do a GET  
>> on the resource, or what you send when you do a POST/PUT.
> I am not sure how many people will agree on the latter part.  I  
> cannot.

People talk much more about GET than about POST and PUT, but I'm  
pretty sure that I have correctly captured the spirit of the HTTP  
spec, of Roy's Chapter 5, and of AWWW when I say that we can change a  
resource's state by submitting a representation using POST or PUT. If  
you disagree, I'm pretty sure that we are simply using language  
differently, and you should probably use another term instead of  
“representation” for what you have in mind. (“state of the resource”?)

Repeat after me: Representations are <bitstream, MIME type> tuples.

>> Dereferencing is the process of “reaching through the network” in  
>> order to perform one of the supported operations on a resource.
>>
>>> To think whether something is *in* a document or not is just a  
>>> form of self-contradiction because the goal is to make the web  
>>> *self-descriptive*.  Hence, a document (or resource) is both in  
>>> and not in itself.
>>
>> What do you mean when you say “something is in a document”? I can  
>> understand the phrase “something is described in a document”.  
>> Obviously a document can describe itself. I don't see the  
>> contradiction.
> I intended it the same way you described that "something described  
> inside a document".  I am trying to understand what you mean that  
> "303 redirects are about creating URIs for “things described inside  
> documents”.  Do you mean if something talks about itself, it should  
> 303 redirect?  Or something else?

I mean something else.

I have this notion in my head that the Web is a collection of  
documents, and a web document is not the same as the things the web  
document talks about. Hence it's better not to use the same URI for a  
web document and the things the web document talks about (except where  
it talks about itself). I can't state it any simpler than this. I  
consider it self-evident. If you don't agree, I give up trying to  
communicate this idea and we just have to accept that we live in  
different realities ;-)

>> [snip]
>>>>   some_resource
>>>>      |
>>>>      +--303--> description_of_some_resource
>>>>                   |
>>>>                   +--Content-Location-->  
>>>> description_of_some_resource.{html|rdf}
>>>>
>>>> That's the clean and proper way of combining the 303 approach  
>>>> with content negotiation!
>>> It is *clean* only when the distinction of *resource* vs.  
>>> *representation/description* is unambiguous, which hardly is.
>>
>> Those who use the approach described here simply make a modelling  
>> distinction between documents and the things described in the  
>> documents. That distinction *is* unambiguous. (But it is  
>> subjective.) The described approach is “clean” in the sense of HTTP  
>> interactions. And it is “clean” in that it enables the modelling  
>> style described above.
> But the web is about facilitating ad hoc communication.  If you have  
> an unambiguous but subjective distinction and I have mine?  Is it  
> going to be unambiguous or not when we intend to communicate with  
> each other?

I can unambiguously communicate my subjective choice, and you can  
recognize the difference in our choices and work around it. No  
technical solution will protect you from subjectivity on the Web.

>>> In either case, i.e., the (1) and (2) solution mentioned above,  
>>> 303 is unnecessary.  Sure, it does no harm.  But it does slow down  
>>> the web and our goal should be to make the web more efficient but  
>>> less.
>>
>> That's why I keep insisting that you should use hash URIs, which do  
>> not exhibit this downside, instead of the 303 approach. The  
>> solutions mentioned above are for those who have decided to use  
>> 303s anyway, despite the well-known downsides.
> First, there is reality issue - such as dublin core etc., already  
> uses slash.

If the additional redirect becomes a serious performance problem, then  
the 303 users will be slowly losing linkshare to hash-based  
alternatives. Let the market fairy sort it out.

> Second, there are clear use cases where #hash URI is not  
> appropriate.  Consider the document size, if a domain vocabulary,  
> such as that of SNOMD, will be using #URI.

This is not an issue. You can do <document1#it>, <document2#it> and so  
on. You can chunk your documents any way you like. Granted, it's less  
flexible than 303 redirects.

> Third, there is still the issue of the nature of resource because  
> what a hash URI denotes when there are multiple representation/ 
> variants, isn't clearly defined. If a (generic) resource say http://example.com/gr 
>  has two representations, an RDF and an HTML one, that can be get  
> via conneg.  Let's give each of them a URI http://example.com/gr.rdf  
> and http://example.com/gr.html.  What will be http://example.com/ 
> gr#a denoting?

It will denote whatever the variants say it denotes. It's possible to  
coordinate the variants to make them communicate the same idea.

But you are right, this is a somewhat messy area and the different  
specs involved don't answer all questions. I think however that we  
have a quite clear understanding of what the desirable answers would  
be, that is, what the specs *should* say to make things work out.

> And what is the relationship between http://example.com/gr.rdf#a and http://example.com/gr.html#a 
> ?

There is none, unless explicitly stated.

> This is a muddled area, which I hope TAG can find time to give some  
> recommendations.

+1.

Best,
Richard


>
>
> Regards,
>
> Xiaoshu

Received on Saturday, 9 August 2008 09:13:15 UTC