Re: Review of SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs review, rev1.56 from Steve Harris on 2010-10-07 (public-rdf-dawg@w3.org from October to December 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Thu, 7 Oct 2010 22:08:57 +0100
To: Chimezie Ogbuji <ogbujic@ccf.org>
Cc: "public-rdf-dawg@w3.org Group" <public-rdf-dawg@w3.org>
Message-Id: <D006CAF9-2EAB-4F62-90D9-F2356BA0B958@garlik.com>
On 2010-10-07, at 20:37, Chimezie Ogbuji wrote:

> Hello, Steve.  Thanks for the review.  My response is inline below
> 
> On 10/5/10 4:27 PM, "Steve Harris" <steve.harris@garlik.com> wrote:
>> Major
>> 
>> §4.3
>> 
>> I'm concerned that "Implementations of this protocol MUST obey the rules
>> specified there regarding the resolution of relative URI references" rules out
>> reverse proxies as implementations of this protocol. In our experience reverse
>> proxies are commonly used infront of SPARQL endpoints to provide load
>> balancing, additional security, and/or hardening. From the clients p.o.v. the
>> proxy is the SPARQL endpoint.
> 
> I'm not sure I understand.  Can you give an example of how the URI
> resolution rules in [RFC3986] break proxy behavior and some idea of how
> other protocols that build on URIs in the same way avoid this problem? Do
> you have some suggested text to address this or are you saying the normative
> dependence on [RFC3986] is the issue here?

Other protocols don't avoid this problem, but they don't mandate this behaviour.

My proposal would be to make it an error to give a relative URI as an argument to &graph=.

>> Also HTTPS is an issue here, it's not generally possible for the server to
>> tell from HTTP headers whether the HTTP or HTTPS protocol was used by the
>> client, unless it has sight of the outer layers of the networking code. It may
>> be possible to guess from the port number, but I don't think that's a good
>> basis for picking URIs.
> 
> I'm not sure I understand the issue here as well.  Are you saying that using
> graph URIs of the form https://example.com is problematic because the server
> cannot verify if the underlying protocol being used is indeed HTTPS? If so,
> I'm not sure what this has to do with this specification in particular
> (rather than being a general issue with using URIs).

Suppose the client sends a request like:

PUT https://server1:4444/data/?graph=1

and server1:4444/data/ is a reverse proxy to server2:8080/store/, this is only marginally hypothetical, as we do more or less exactly that to provide HTTPS access to our endpoints.

The client will be expecting the created graph to be https://server1:4444/data/1, but the server will end up creating it as http://server2:8080/store/1.

Again, I would prefer to make it an error to pass relative URIs to graph=, unless there's some strong usecase for it.

>> I find the logic in the last example in 4.3
>> (http://example2.com/rdf-graphs/employee/1) a bit convoluted. It seems like
>> it's necessary to parse the RDF document before you can determine which graph
>> is to be updated. If I sent that request, and got that result I would
>> certainly be surprised.
> 
> Parsing the document is necessary if the indirectly specified graph URI is
> not absolute because that is the highest precedent (and must be ruled out).
> See the earlier thread on this:
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2010JanMar/0542.html
> 
> It is the same as in any other specification that needs to resolve relative
> references involved in an HTTP message with a payload that can embed the
> base URI and where the specification has a normative dependence on RFC 3986
> for defining the appropriate behavior.
> 
> For example, if you were to fetch and parse the following document from
> http://example.com/rdf-graphs/service/:
> 
>    <?xml version='1.0' encoding='UTF-8'?>
>      <rdf:RDF
>        xml:base='http://example2.com/rdf-graphs/employees/'
>        xmlns:rdf='...'>
>        <rdf:Description rdf:about="1">
>            <rdfs:label>An RDF document</rdfs:label>
>        </rdf:Description>
>    </rdf:RDF>   
> 
> The resulting RDF triple should be:
> 
> <http://example2.com/rdf-graphs/employees/1> rdfs:label "An RDF document"
> 
> Despite the fact that it was fetched from example2.com.

Sure, but I don't see how those things are related.

One is about URIs of resources within the document, and one is the URI of the document. It strikes me as strange the the contents of the document would change it's externally visible URI.

>> Minor
>> 
>> Abstract
>> 
>> "SPARQL update language", should it be SPARQL Update...?
>> Why is “statements” italicised?
>> http://www.w3.org/TR/ is not linked
> 
> Changed.
> 
>> §1
>> 
>> "It emphasizes a clear separation between a RDF graph management action from
>> the networked body of RDF knowledge identified as the target of the action,
>> the lexical form of a Request URI, the URI of a graph in an
>> Network-manipulable Graph Store, and the (optional) RDF delivered with the
>> message" — I can't parse this sentence.
> 
> Changed to:
> 
> [[[
> It emphasizes the distinction between an RDF graph management action, the
> networked body of RDF knowledge identified as the target of the action, the
> lexical form of a Request URI, the URI of a graph in an Network-manipulable
> Graph Store, and the (optional) RDF delivered with the message
> ]]]

OK, great. I'm not sure what "networked body of RDF knowledge" means, but the sentence is much clearer now.

>> §2
> 
>> I think REST should be in []s, there's a informative reference at the end of
>> the doc.
>> "Network-manipulable Graph Store - The subset of a Graph Store comprised of
>> named RDF graphs that can be directly managed by interactions through this
>> protocol" — does this imply that you can't managed the unnamed graph?
> 
> Yes, as it is currently written it doesn't support this.  We now have an
> issue for this, and it will need to be addressed in the next round of edits.

Great.

>> "RDF knowledge" seems to overlap heavily with "RDF Graph", and "RDF graphs"
>> used earlier, and "RDF payload" defined later.
> 
> Their definitions distinguish them from each other.  RDF knowledge is an
> information resource (the others are not).  An RDF graph is a data structure
> (defined elsewhere).  RDF payload is the representation carried in an HTTP
> message within the protocol (the others are not), etc.

OK.

>> §4 
>> in 4.1 it might be worth a quick note about what to do in the presence of
>> reverse proxies, or even just noting that they're a problem in this case.
> 
> This is related to an earlier question, but do you have an example of a
> problematic situation or a summary of the general problem that can be used
> as a basis for such a note?

I give an example in this email.

>> I didn't feel that Figure 1 made the situation clearer. I think if the network
>> operations were separated out from the conceptual relationships it might help.
>> Is Figure 2 missing some arrows? There doesn't seem to be a connection between
>> the encoded URI, the graph store, and the operation.
> 
> Figure 1 was updated to reflect the change to the term "RDF knowledge"
> instead.  I have added an 'identifies' relationship between the 'parent' URI
> and the target of the operation in Figure 2.


I don't have time to go over the diagram again, but it was only a minor point anyway.

The other changes blow all look great.

Cheers,
   Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Thursday, 7 October 2010 21:09:31 UTC