RE: Subgroup to handle semantics of HTTP etc? from Williams, Stuart (HP Labs, Bristol) on 2007-10-22 (www-tag@w3.org from October 2007)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Mon, 22 Oct 2007 16:29:58 +0100
To: <wangxiao@musc.edu>
Cc: "Booth, David (HP Software - Boston)" <dbooth@hp.com>, "W3C-TAG Group WG" <www-tag@w3.org>, "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Jonathan A Rees" <jar@mumble.net>, "Dan Connolly" <connolly@w3.org>, "Tim Berners-Lee" <timbl@w3.org>
Message-ID: <C4B3FB61F7970A4391A5C10BAA1C3F0DEE0272@sdcexc04.emea.cpqcorp.net>
Hello Xiaoshu,

> -----Original Message-----
> From: Xiaoshu Wang [mailto:wangxiao@musc.edu] 
> Sent: 22 October 2007 14:44
> To: Williams, Stuart (HP Labs, Bristol)
> Cc: Booth, David (HP Software - Boston); W3C-TAG Group WG; 
> Alan Ruttenberg; Jonathan A Rees; Dan Connolly; Tim Berners-Lee
> Subject: Re: Subgroup to handle semantics of HTTP etc?
> 
> 
> 
> Williams, Stuart (HP Labs, Bristol) wrote:
> >> I think, the root cause for all these is the httpRange-14. 
> >> The way its resolution is written just sounds like a inference.  
> >> After some thoughts, I start to think that "httpRange-14" gets it 
> >> wrong.  The issue is raised to solve the URI ambiguity.  But what
it 
> >> does is to open more issues than it has solved.
> >>     
> >
> > Would you care to enumerate some of those please? 
> > I'm particularly interested in those problems that you attribute to
the 
> > 'resolution' rather than those that exist independent of it.
> >   
> > After much email and debate, the TAG resolved the question such that

> > Web Architecture places no constraints on what can be referred to 
> > using http scheme URIs, with or without a '#' in a given URI.
> >   
> Yes, I think this is the correct conclusion because it 
> separates the URI from its *dereference* protocol.
>
> > A consequence of following the TAG's advice at [1] ...
> >   
> 
> <snip>.  I was aware of it and I thought it was right too.  
> But a project that I am working on and the questions posted 
> on various mailing archive let me to start rethinking it.
> 
> >> The whole issue, I think relies on how we understand the
relationship 
> >> between the following two things.
> >>
> >> 1) The thing that a URI denotes, let's call it T.
> >>     
> >
> > That would be what AWWW calls a resource, right?
> >   
> Yes
> 
> >> 2) The thing that you get back from dereferencing the URI, 
> let's call 
> >> it R.
> >>     
> >
> > That would be what AWWW calls a representaion, right?
> >   
> Yes.
> 
> >> The important question is whether T should be R? Most people think 
> >> so, but I think we should not.
> >>     
> >
> > In which case I think you and AWWW are in agreement.
> >   
> Hmm.. not really.  I think AWWW's opinion is that for some 
> resource, i.e., the information resource, T=R.  At least, 
> most people reading the http-Range14 would get an impression of that.

But... you just agreed that T's correspond to webarch:Resource like
things and R's correspond to webarch:Representation like things.

I think that AWWW has tried to be very clear about the difference
between webarch:Resources and webarch:Representations. Certainly I do
not believe that it was the intention of webarch that anyone should
conclude it possible for "T=R". Indeed, where say some specific variant
of a resource has been assigned a distinct URI (say the english HTML
version of a document generally available in english and french,
postscript, PDF and HTML) then that is cast as a different resource
which responds with a more restricted set of representation than the
corresponding generic resource. That a relationship exists between the
specific resource and a generic resource is evident in a
"Content-location" header (which of course given the general skeptisism
around may be incorrect or inconsistent - but it's not very useful in
that case).

I think that maybe you don't intend that your 'R's are
webarch:Representations but instead are specific webarch:Resources in
relation to some generic resource - maybe.

> >> First, a protocol, such
> >> as HTTP, is just one of the many protocol that can be used to 
> >> "dereference" a URI.  Second, the HTTP content negotiation makes it

> >> impossible that R is T.  For instance, if we normalize all the HTTP

> >> GET by moving all the Accept header into a query string.  Then,
given 
> >> a URI like "http://example.com/foo"
> >>
> >> T =  http://example.com/foo
> >>
> >> But R can be one of the followings
> >>
> >> R1 = http://example.com/foo?Accept=text/html
> >> R2 = http://example.com/foo?Accept=application/rdf+xml
> >> R3 = http://example.com/foo?Accept=anything
> >>
> >> And they have completely different URIs.
> >>     
> >
> 
> Hmmm.... this seems to confuse resources with representations. T can 
> be taken as a reference to a generic resource while R1,R2  and R3 can 
> be taken as references to more specific resources which give access to

> a narrower set of representations than T (a some given instant).
> 
> That is exactly the point, is there a URI for R? (I think 
> not)

You are correct... there is no URI for 'R'. What there may be is a URI
for a different resource T' that is a more specific variant of T. In
HTTP that URI may be made available via the "Content-location" header of
an HTTP response. Such a variant may be invariant to come combination of
time, natural language, content format... whatever axis of variation
there may be.

> If someone think so, what is the URI for the returned 
> representation?

see above

> >> In other words, what a URI identifies will *never* be the same as
what the URI is 
> >> dereferenced unless we explicitly assert them.
> >>     
> >
> > ? don't understand the claim. 
> >   
> What I mean is: what a URI identifies is always a resource in 
> the sense of TBL's generic resource irregardless of it is a 
> network resource or not. 

Ok...

> For example, let's use "http://example.com/abook" to denote a 
> particular book.  This URI can be grounded on various 
> systems, each of which may have different 
> mechanisms(protocols) to dereference the URI.  Which system 
> to use and which protocol to use is up to a client.
> 
> - In a traditional market place, such as bookstores, a client may get
back a printed copy of the book.
> - In a book-reading club, a client will get back a stream of sound
wave. 
> - In the web, a client will get back either a bit-stream, which can be
further subdivided by the MIME type into html, 
> rdf or pdf stream... 
> 
> But those things - printed copy, sound wave, bit-stream - are 
> NOT the book identified by the "http://example.com/abook".  
> They are one particular representation of the book. 

Yes... I think it has been well understood that what is obtained by
performing an HTTP GET is a webarch:Representation rather than the
resource itself.

> They may 
> referred to as
> 
> _:aPrintCopy awww:hardCopyOf <http://example.com/abook>.
> _:anAudio awww:soundOf <http://example.com/abook>.
> _:anHTMLRep awww:informationResourceOf <http://example.com/abook>.
> _:anPDFFile awww:informationResourceOf <http://example.com/abook>.
> .....
>
> Please note that my last two assertions because I think it is 
> more appropriate to define *information resource* as the set 
> of all representations of all generic URIs.  Such a view has 
> few advantages.

Sorry, but I am not getting this - there may be some words missing.

Alt1: if "information resource" is the set of all representations
obtainable from all generic URIs (over all time? or at an instant?) how
are we to discriminate one information resource from another?

Alt2: if "information resource" is the set of all generic resources
(meaning informations resources that may have more specific variant
resources available) I don't see how any of 1-3 below follow.


> 1) It is much easier to understand and consistent because it 
> doesn't matter if a URI identifies a network resource, a 
> person, or a namespace, or an ontology.

I don't understand what comparison is being made here. Something is
'easier' than something else... but I'm struggling to ground the
'somethings'.

> We understand what 
> we get back is just a particular representation of that 
> resource that we try to understand within a given information system.

I believe that has been the understanding in the web community for quite
sometime, that what you get back are representations, not the resources
themselves.

> 2) It is more efficient.  We don't need 303 redirect anymore.

Again, the subjects of the comparision are a little obscured.

In anycase, there are some for whom a distinction between information
resources and everything else is important. This in itself has been the
subject of continued debate. AWWW was published at a time prior to the
TAG resolution of httpRange-14 and was (at that time) intended to be
neutral wrt outcome of that issue.  There were two questions tangled up
in the older thread:

a) Can any kind of thing be named with an http URI (sans-fragment)? -
the httpRange-14 resolution answered yes.
b) How can you tell if a resource is an information resource aka
document? - httpRange-14 resolution answered that a 200 response is a
sufficient condition.

There is, was and continues to be debate as to whether the category
distinction was an important one to be made.
Some wanted to make the distinction by URI matches ^http:[^#]* => URI
refers to information resource.

In think your proposal (though it is not clear what it is) sets asside
the ability to make a distinction - which will matter to those who
consider it important.

> 3) It can avoid unnecessary proliferation of URIs and allows 
> various *information resources* be logically grouped under 
> the same URI without physically bound to each other.

Well... you've kind of manfactured a proliferation of URIs to serve your
example, but even so, I don't see how what you propose accomplishes what
you claim here.

>  This is 
> particular important for me because I am developing an 
> RDF-based Data Format Description Framework 
> (http://dfdf.inesc-id.pt), where the data format description 
> (in RDF) is separated from data encoding (in binary of any form. 
> 
> 4) It also solves the conceptual problem of the URI with a 
> fragment identifier.  Because with content negotiation, the 
> nature of a URI with fragment identifier becomes a problem 
> with the traditional view of the IR and non-IR.  For 
> instance, what will  "http://example.com/#chapter1" 
> identify?  Say if we intend the URI to identify the first 
> chapter of the book, what should it be used to denotes in an 
> HTML representation?  
> Would it be wrong, if the URI identifies a <div> or a <h1> 
> element?  In the traditional distinction of IR or non-IR, 
> this is very likely to be considered wrong.  And the best we 
> can do is to carefully avoid name conflict of the fragment 
> identifier in each representations. But this, in turn, hurts 
> the usability.  For instance, if I request the HTML 
> representation of a particular ontological term say, I would 
> like the browser to automatically screw to the relevant 
> section instead of finding it.  But with this newly proposed 
> view of the relationships between URI, HTTP, Resource, 
> Representation/InformationResource, it will be O.K. to use 
> "http://example.com/#chapter1" to identify a <div> or <h1>.  
> Because what gets for a fragment URI is the same for a 
> primary URI, it is just one of, but not *the*, representation 
> of the resource.

Well here is the nub of an issue. But it is *not* an issue with
information v non-information resources or with different URI schemes.
It's an issue of media types and maybe with intention.
*If* you are deploying a resource in a variety of formats then you have
an obligation to ensure that those fragment identifiers which resolve
multiple representations are consistent in what they denote. The
"application/rdf+xml" media type has an implicit "thing described by"
indirection from a node on a graph to the thing that it stands for. For
text/html, what is referred to is a piece of text in a representation
(and not what that text might be about or describe). IMO that makes it
close to impossible to deploy triples and hypertext as variant
representations of the same resource - not totally impossible, but hard
to do so conveniently without accepting some level of 'pun' over whether
a piece of text is being reference or the thing it describes (if indeed
it describes anything).

So... in your example above: what is your intention? That
"http://example.com/#chapter1" denotes a chapter in some conceptual work
(available maybe in any of the type of representation you mentioned
above) or that it denotes either a text position or a text region within
a retrieved representation? If you're going to be making utterances in
RDF, it may be inconsitent for it to be taken as either or both!

> Cheers,
> 
> Xiaoshu
> 

regards,


Stuart
--
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks
RG12 1HN
Registered No: 690597 England
Received on Monday, 22 October 2007 15:30:37 UTC