Re: Back to HTTP semantics from David Booth on 2009-06-11 (public-awwsw@w3.org from June 2009)

From: David Booth <david@dbooth.org>
Date: Wed, 10 Jun 2009 20:17:33 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: Jonathan Rees <jar@creativecommons.org>, AWWSW TF <public-awwsw@w3.org>
Message-Id: <1244679453.3705.327.camel@dbooth-laptop>
On Wed, 2009-06-10 at 17:46 -0500, Pat Hayes wrote:
> On Jun 10, 2009, at 8:26 AM, Jonathan Rees wrote:
> 
> > OK, I think it would be useful to review where we are, before getting
> > too carried away again with comparison of conflicting models.
> >
> > I know each of us had a different question coming into this. I will
> > mention here only the two I think we (or at least I) have been working
> > on:
> >
> > 1. When an agent responds to an HTTP request, what is it saying (what
> > can it be "held to"), and how might we capture that in RDF?
> > 2. When someone (in RDF) *talks about* HTTP behavior, and wants to say
> > more about it than it says on its own, what sort of vocabulary would
> > be generally useful?
> >
> > Re #1:
> >
> > The answer to #1 clearly depends on prior agreement between requester
> > and responder. Usually this is just RFC 2616 and nothing else. It
> > might in particular cases be a more consequential contract (e.g.
> > webarch, httpRange-14, IRW, genont, Boothianism, Hayesianism), but we
> > cannot assume these in general and there are no generally agreed
> > markers that a responder is following any of these other conventions.
> >
> > My reading of RFC 2616 is that the response says so little about the
> > resource that there is nothing we can usefully capture about it in
> > RDF. The entity "corresponds to" the resource (or the resource in its
> > current state), but that is so weak as to be useless.
> 
> Well, but don't give up entirely. We can say very weak things in RDF,  
> and they might turn out to be useful, if only in a mild way. It might  
> be worth just writing this down and waiting until later to sort out  
> what it might mean. We have two things, I gather: an entity and a  
> resource, and a corresponds_to property relating the former to the  
> latter. OK, thats a start. (Pity about "entity', but those are the  
> lexical breaks...) And we know that corresponds_to is functional, right?

Actually we have *three* things: the URI, the entity and the resource.
Furthermore, we also know that the resource is the kind of resource that
may respond to GET requests, i.e., an "information resource" (assuming
some appropriate definition of such).

> 
> > The response
> > does say quite a bit about the *entity* in the response (the
> > wa-representation), and we could attempt to capture that. And it is
> > certainly useful to record the uninterpreted fact of a particular HTTP
> > interaction, such as when it happened and what the request and
> > response were, in case that can be used in testing or in evidence or
> > hypothesis generation, but this is more the job of HTTP-in-RDF than of
> > AWWSW. But lacking further assumptions or information, the
> > wa-representations say almost nothing about the resource. If the URI
> > owner has even thought about what the URI names (unlikely), they might
> > adhere to just about any ontological or pragmatic stance, and still be
> > within the broad confines of RFC 2616.
> 
> True.
> >
> > So to the question, what is the responder saying about a resource? my
> > answer is, lacking other information or assumptions beyond RFC 2616,
> > nothing.
> 
> Well, that it is corresponded_to by an entity. That is *something*. We  
> know for example that there must be many things for which this is not  
> true, because there aren't enough entities to go round.

So far we know:

1. There is a relationship between the URI and the resource.  RFC2616
uses the term "identifies" for this relationship.

2. The returned entity corresponds_to the resource.

3. The resource is the kind of resource that can have representations.

4. Other facts of the response, such as what time it occurred, the
number of bytes, etc.


> 
> > We are therefore finished with this particular part of the
> > exercise.
> >
> > Re #2:
> >
> > As for ontologies that would help us talk about web resources, and
> > communicate assumptions, observations, and promises - from outside
> > HTTP, as it were - I think we can make some progress. The world
> > already has Dublin Core and FOAF widely deployed, so we ought to
> > analyze how they are being used. I think we should continue to review
> > IRW and HTTP-in-RDF to make sure they will help steer metadata
> > generation and dialog around architecture in a good direction. And I
> > still think we should aim to publish an ontology of some kind,
> > although it's not clear what should be in it.
> >
> > It is important to distinguish between two cases: One where the URI
> > owner is providing the metadata, in which case it can be considered
> > constraining or "authoritative", and another where someone else is
> > providing the metadata based on what is observed in HTTP responses, in
> > which case it might be merely speculative.

But if the metadata is based on what is observed in HTTP responses then
that is also "authoritative", as those responses were also issued by the
URI owner or his/her delegate.  So I don't think you're dividing the two
classes properly.  I think the division should be between what the URI
owner says (which may be implicitly through HTTP responses or through
other mechanisms) versus what others say (but not necessarily based on
any HTTP observations).

> 
> I know Im in the minority here, but I really think this isn't a  
> significant distinction, nor indeed should it be. Ownership of the URI  
> has almost nothing to do with what it refers to. That is determined by  
> how it is *used*, and its inherent in the Web that the publisher has  
> absolutely no control over that once the URI is published.

But: 

 - Even if this distinction isn't significant to everyone it should
still be retained, since it can always be ignored by those who don't
care.  

 - How a URI is used is strongly influenced by what the URI owner says
about it, even if it isn't 100% controlling.

> 
> > As an example of the
> > second, it is reasonable to say
> >    <http://dx.doi.org/10.1155/1995/10717> dc:creator _:1.
> >    <http://www.hindawi.com/86874642.html> foaf:primaryTopic _:1.
> > by simply looking at HTTP responses - even though the URI owners (who
> > are probably only dimly aware of RDF and have probably never heard of
> > web architecture) haven't said anything licensing these URIs as names
> > for resources that have these properties. This is spontaneous "folk
> > RDF" of the kind that I think was envisioned when RDF was first
> > developed, and although logically unsound, 

Why do you say it is logically unsound?

> it works (is useful) for
> > any of a variety of reasons:
> >  - the "URI ownership" idea doesn't apply - the RDF author is
> > defining the URIs in a way that it finds useful (this is "squatting")

If the URI owner has said nothing about what these URIs denote -- either
implicitly through HTTP responses or explicitly in other statements --
then this is URI squatting.  But the URI owner *has* said something
about what those URIs "identify", at least in an RFC2616 sense: when I
pasted them into my browser I got 302->200 and 200 responses back.  HTTP
responses are a form of speech.  So in this case the statement author
above *was* justified in making those statements.

> >  - because the authors are relying on common sense instead of
> > specificational rigor
> >  - because the formulation has low overhead and high utility, and the
> > cost of being wrong is low.
> >
> > (I'm not sure I want to endorse the above practice; just to admit that
> > it happens and has some merit.)
> >
> > I think one way to explain the httpRange-14 restriction is as an
> > attempt to forestall conflicts between this kind of squatting and what
> > the URI owner might have to say.  For example, if I wrote the above
> > RDF, and was happy, and then the owner of
> > http://www.hindawi.com/86874642.html came along later and said
> >  <http://www.hindawi.com/86874642.html> rdf:type foaf:Person.
> > I'd be in a bit of a pickle.

Only if foaf:Person is disjoint with the class of things that can have
awww:representations.  Your application may indeed make this
disjointness requirement, in which case you would be in a pickle, but
it's a pickle caused by the fact that the URI owner told you both: (a)
that http://www.hindawi.com/86874642.html identifies a foaf:Person;  and
(b) that it identifies a thing that can have awww:representations.
Still, you can get around that pickle by "splitting" the identity of the
resource denoted by http://www.hindawi.com/86874642.html , as described
here:
http://dbooth.org/2007/splitting/

On the other hand, if your application does not make this disjointness
requirement then there is no harm caused.

> 
> Well, but you would be in the same pickle even if they had used the  
> approved HTTP redirection, right? (Or is the point that in that case  
> you wouldn't have drawn the first conclusion? But why not, if as you  
> say the owners don't know squat about RDF or indeed care?)
> 
> In other words, I don't see how this problem, if it is one, relates to  
> http-range-14.
> 
> Pat
> 
> > The URI ownership principle would say I
> > was wrong, and I'd be forced to either fix my content (which might be
> > hard) or thumb my nose at webarch. If Hindawi follows the httpRange-14
> > this situation won't arise.
> >
> > By publishing an ontology we (AWWSW) would in effect be making a new
> > recommendation - new vocabulary that we suggest the community take up
> > for certain purposes. Some examples off the top of my head:
> >  - we could try to come up with a way to express useful contracts
> > such as a promise of unchanging (time-invariant) content that would
> > apply to any web-resource ontology, not just to genont.

I'm not so hot on that idea.

> >  - we could take one of the FRBR or IAO classes, and come up with new
> > types or relations that would relate the nature of the named resource
> > to HTTP behavior (e.g. by saying that responses need to carry
> > information related in a definite way to the resource) - thus
> > explaining how/why these ontologies might apply to web resources.

Also not so hot on that idea.

> >  - we could explain how to take the above dc:creator example to a
> > higher degree of rigor - how to respect the specs and avoid making
> > statements that aren't well justified.

That sounds more useful to me, as it sounds more fundamental.

> >
> > Re other work:
> >
> > I haven't forgotten about the rest of the agenda - which I might
> > characterize as modeling aspects of AWWW and semweb architecture - but
> > unless someone convinces me that this is a prerequisite to #2, or that
> > #2 is not our best next task, I think we should continue to put it off
> > for a while.
> >
> > Jonathan
> >
> >
> >
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
> 
> 
-- 
David Booth, Ph.D.
Cleveland Clinic (contractor)

Opinions expressed herein are those of the author and do not necessarily
reflect those of Cleveland Clinic.
Received on Thursday, 11 June 2009 00:18:11 UTC