Summary of TimbL's comments on HTTP Protocol draft from Chimezie Ogbuji on 2010-08-16 (public-rdf-dawg@w3.org from July to September 2010)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Mon, 16 Aug 2010 10:48:39 -0400
To: "SPARQL Working Group WG" <public-rdf-dawg@w3.org>
Message-ID: <C88EC807.12EBD%ogbujic@ccf.org>
Below is a summary of Tim's comments regarding the HTTP Protocol along with
a few comments of my own for the purpose of group discussion:

"1) One point is that, mostly, this is a book about how to implement a
linked data when you have an existing SPARQL server.  This is clearly a good
idea.. That could be expressed in the abstract."

I think this one speaks for itself

He had some comments with the motivation behind 4.2 (indirect graph
identification).  In particular with the statement:

".. it is often the case that the naming authority associated with the URI
of an RDF graph in a Network-manipulable Graph Store is not the same as the
server managing the identified RDF content, the naming authority is not
available, or the URI is not dereferencable"

He felt that such an authority was broken.  What the text was attempting to
motivate, was the various scenarios where you can't rely on using the graph
IRI directly to manipulate the RDF knowledge.  The primary one is situations
where the graph IRI is not resolvable, but I was attempting to generalize
beyond this.

Looking at some of the relevant language from RFC 3986,

[[[
.. domain name ownership may change over time for reasons not anticipated by
the URI producer. In other cases, the data within the host component
identifies a registered name that has nothing to do with an Internet host. 
]]]   

Perhaps the text should be talking about the host not the authority:

".. it is often the case (with HTTP graph URIs in particular) that the
server associated with the hostname is not the same as the server managing
the identified RDF knowledge (as a result of a change in domain name
ownership, for instance), the host server is not available, or the URI is
not dereferencable"

He preferred 4.2 be conceived "Graph Mirroring".

"Often, one organization publishes or re-publishes another's data. In this
case the a graph with one URI is actually published at another URI."

This could be another motivating reason for indirectly identifying graphs,
but I'm not sure if (by itself) it accommodates the other reasons (such as
having a non-resolvable graph IRI)

He suggested not using "?" as a way to 'embed' graph IRIs but to use
     
     http://example.com/rdf-graphs/www.example.org/other/graph

Instead.  I think this came up in discussions earlier (see:
http://lists.w3.org/Archives/Public/public-rdf-dawg/2009OctDec/0030.html),
but we didn't seem to have strong opinions one way or another.

In that thread, Steve H had the following example, which has the advantage
of encoding the entire graph IRI (including the scheme):

http://localhost:8080/data/http%3A%2F%2Fexample.com%2Fdata.rdf
 
TimBL, mentions that a rewrite rule on the server can of course turn this
into the "?" form internally, but I wonder if that is a deployment /
implementation detail once we have an accepted form for the protocol

He gives some reasons not to us "?":
-  some proxies do not cache things with a "?" in as they assume that they
will be transient queries never asked again.
- some people might want to directly mirror sets of virtual or static RDF
documents which have relative URIs between them and the ? messes that up.
- when a graph is serialized and has references to nearby URIs, then the
serializer will generate (sometimes much) smaller output when relative URIs
can be used.  

Perhaps SteveH has something to say about the 1st and 3rd points (I know he
had some reverse proxy scenarios in mind with this particular interface).
As the for 2nd point, I'm not sure if this relates to our recent
conversation about resolving relative URIs for embedded IRIs and if the
solution we discussed below (which was not in the text when he reviewed it)
addresses this:

[[[
In situations where there is no Base URI in the payload and a graph IRI is
embedded, the RDF document that represents [AWWW] the networked RDF
knowledge identified by the embedded graph IRI SHOULD be considered the
retrieval context (5.1.2) [RFC3986]. Thus, the default base URI is the base
URI of that RDF document.
]]]

We probably need some clarification for his 4th point, because I wasn't sure
if this was just a comment leading up to the following point or if there was
something specific he was looking for that makes it clear that for the GET,
PUT, (and presumably DELETE?) verbs, HTTP Update is basically HTTP"

".. Where GET and PUT are concerned this is not a new protocol, and the
document should take the position as to it is explaining how for a SPARQL
service owner to support HTTP on those graphs (or rather, virtual RDF
documents)."

In the next point (5) he says that when a POST  is done, this *is* a new
protocol supported the append functionality, but again it wasn't clear to me
if there was specific changes or additions to the text he was looking for.

He did specifically ask that this protocol support the scenario where a POST
with content-type "application/sparql" is understood to be an invokation of
the SPARQL Query protocol (essentially) where the default graph of the query
is the graph IRI (embedded or otherwise).  This begs the larger question of
the interplay between the various SPARQL 1.1 protocols in particular as it
relates to overlapping HTTP bindings.

Finally, he specifically asked to  support the of use 'MS-Author-Via'
headers, which (reading from " Read-Write Linked Data") are meant to
indicate preference for modifying RDF knowledge either via PUT (if the value
is 'DAV') or via SPARQL/Update if the answer is 'SPARQL' (or whatever IMT we
assign to the SPARQL Update language)  .

Digging further (from
http://msdn.microsoft.com/en-us/library/cc250217(PROT.10).aspx):

[[[
This header field indicates to the issuer of an HTTP OPTIONS command what
protocol mechanism is preferred for authoring documents in this particular
namespace. The preference MUST be ordered so the first mechanism listed is
the one most preferred by the server.
]]]

So this is in some way related to the newly added suggestion to use HTTP
OPTIONS to determine the capabilities of the server, but here the response
is a specific indication of the preferred means of authoring RDF documents
for a particular server; so it is also related to the larger question of
making sense of the various bindings to HTTP 1.1 that we have.

-- Chime


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News & World Report (2009).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Monday, 16 August 2010 14:49:31 UTC