Re: Subgroup to handle semantics of HTTP etc? from Xiaoshu Wang on 2007-10-22 (www-tag@w3.org from October 2007)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Mon, 22 Oct 2007 14:44:26 +0100
To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
CC: "Booth, David (HP Software - Boston)" <dbooth@hp.com>, W3C-TAG Group WG <www-tag@w3.org>, Alan Ruttenberg <alanruttenberg@gmail.com>, Jonathan A Rees <jar@mumble.net>, Dan Connolly <connolly@w3.org>, Tim Berners-Lee <timbl@w3.org>
Message-ID: <471CA93A.1050106@musc.edu>
Williams, Stuart (HP Labs, Bristol) wrote:
>> I think, the root cause for all these is the httpRange-14. 
>> The way its resolution is written just sounds like a 
>> inference.  After some thoughts, I start to think that 
>> "httpRange-14" gets it wrong.  The issue is raised to solve 
>> the URI ambiguity.  But what it does is to open more issues 
>> than it has solved. 
>>     
>
> Would you care to enumerate some of those please? 
> I'm particularly interested in those problems that youattribute to the
> 'resolution' rather than those that exist independent of it.
>   
> After much email and debate, the TAG resolved the question such that Web
> Architecture places no constraints on what can be referred to using http
> scheme URIs, with or without a '#' in a given URI.
>   
Yes, I think this is the correct conclusion because it separates the URI 
from its *dereference* protocol.
> A consequence of following the TAG's advice at [1] ...
>   
<snip>.  I was aware of it and I thought it was right too.  But a 
project that I am working on and the questions posted on various mailing 
archive let me to start rethinking it.
>> The whole issue, I think relies on how we understand the 
>> relationship between the following two things.
>>
>> 1) The thing that a URI denotes, let's call it T.
>>     
>
> That would be what AWWW calls a resource, right?
>   
Yes
>> 2) The thing that you get back from dereferencing the URI, 
>> let's call it R.
>>     
>
> That would be what AWWW calls a representaion, right?
>   
Yes.
>> The important question is whether T should be R? Most people 
>> think so, but I think we should not. 
>>     
>
> In which case I think you and AWWW are in agreement.
>   
Hmm.. not really.  I think AWWW's opinion is that for some resource, 
i.e., the information resource, T=R.  At least, most people reading the 
http-Range14 would get an impression of that.
>> First, a protocol, such 
>> as HTTP, is just one of the many protocol that can be used to 
>> "dereference" a URI.  Second, the HTTP content negotiation 
>> makes it impossible that R is T.  For instance, if we 
>> normalize all the HTTP GET by moving all the Accept header 
>> into a query string. Then, given a URI like "http://example.com/foo"
>>
>> T =  http://example.com/foo
>>
>> But R can be one of the followings
>>
>> R1 = http://example.com/foo?Accept=text/html
>> R2 = http://example.com/foo?Accept=application/rdf+xml
>> R3 = http://example.com/foo?Accept=anything
>>
>> And they have completely different URIs.
>>     
>
> Hmmm.... this seems to confuse resources with representations. T can be
> taken as a reference to a generic resource while R1,R2 and R3 can be
> taken as references to more specific resources which give access to a
> narrower set of representations than T (a some given instant).
>   
That is exactly the point, is there a URI for R? (I think not) If 
someone think so, what is the URI for the returned representation?
>> In other words, 
>> what a URI identifies will *never* be the same as what the 
>> URI is dereferenced unless we explicitly assert them.
>>     
>
> ? don't understand the claim. 
>   
What I mean is: what a URI identifies is always a resource in the sense 
of TBL's generic resource irregardless of it is a network resource or not. 

For example, let's use "http://example.com/abook" to denote a particular 
book.  This URI can be grounded on various systems, each of which may 
have different mechanisms(protocols) to dereference the URI.  Which 
system to use and which protocol to use is up to a client.

- In a traditional market place, such as bookstores, a client may get 
back a printed copy of the book.
- In a book-reading club, a client will get back a stream of sound wave. 
- In the web, a client will get back either a bit-stream, which can be 
further subdivided by the MIME type into html, rdf or pdf stream... 

But those things - printed copy, sound wave, bit-stream - are NOT the 
book identified by the "http://example.com/abook".  They are one 
particular representation of the book.  They may referred to as

_:aPrintCopy awww:hardCopyOf <http://example.com/abook>.
_:anAudio awww:soundOf <http://example.com/abook>.
_:anHTMLRep awww:informationResourceOf <http://example.com/abook>.
_:anPDFFile awww:informationResourceOf <http://example.com/abook>.
.....

Please note that my last two assertions because I think it is more 
appropriate to define *information resource* as the set of all 
representations of all generic URIs.  Such a view has few advantages.

1) It is much easier to understand and consistent because it doesn't 
matter if a URI identifies a network resource, a person, or a namespace, 
or an ontology.  We understand what we get back is just a particular 
representation of that resource that we try to understand within a given 
information system.

2) It is more efficient.  We don't need 303 redirect anymore.

3) It can avoid unnecessary proliferation of URIs and allows various 
*information resources* be logically grouped under the same URI without 
physically bound to each other.  This is particular important for me 
because I am developing an RDF-based Data Format Description Framework 
(http://dfdf.inesc-id.pt), where the data format description (in RDF) is 
separated from data encoding (in binary of any form. 

4) It also solves the conceptual problem of the URI with a fragment 
identifier.  Because with content negotiation, the nature of a URI with 
fragment identifier becomes a problem with the traditional view of the 
IR and non-IR.  For instance, what will  "http://example.com/#chapter1" 
identify?  Say if we intend the URI to identify the first chapter of the 
book, what should it be used to denotes in an HTML representation?  
Would it be wrong, if the URI identifies a <div> or a <h1> element?  In 
the traditional distinction of IR or non-IR, this is very likely to be 
considered wrong.  And the best we can do is to carefully avoid name 
conflict of the fragment identifier in each representations. But this, 
in turn, hurts the usability.  For instance, if I request the HTML 
representation of a particular ontological term say, I would like the 
browser to automatically screw to the relevant section instead of 
finding it.  But with this newly proposed view of the relationships 
between URI, HTTP, Resource, Representation/InformationResource, it will 
be O.K. to use "http://example.com/#chapter1" to identify a <div> or 
<h1>.  Because what gets for a fragment URI is the same for a primary 
URI, it is just one of, but not *the*, representation of the resource.

Cheers,

Xiaoshu
Received on Monday, 22 October 2007 13:46:31 UTC