Re: Sandro's review of Graph Store HTTP Protocol from Chime Ogbuji on 2011-12-06 (public-rdf-dawg@w3.org from October to December 2011)

From: Chime Ogbuji <chimezie@gmail.com>
Date: Tue, 6 Dec 2011 00:47:18 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <9EBA66D3158A45ECAB9FDCD6E6027385@gmail.com>
Thanks for the thorough review Sandro. Please see my response inline below.
On Monday, November 28, 2011 at 11:54 PM, Sandro Hawke wrote: 
> SUMMARY: lots and lots of little suggestions, plus some major confusion
> about URIs, a few issues with terminology, a problem with using RFC 2616
> for response codes, and one problem with how SD is used. 
> 
> All in all it's a very thorough document, and generally says what needs
> to be said. My suggestions are generally about ways to make it clearer
> to people (like me) who are not already expert in what it's saying. As
> such, most of my suggestions probably wont matter to the hardcore
> implementors who need it most -- they'll generally sort through it and
> figure out what it means. I'm reading it more as someone who might be
> trying to figure out what this is all about, and thinking about how to
> make it clearer for them.
> ..snip..


 I know we went thought a long WG decision process (twice) to
>  arrive at the current title, but in actually talking about this
>  document to a few people, I find the only way to have it make
>  any sense is to use the word "RESTful". So, I propose we
>  amend the title to:
> 
>  SPARQL 1.1 Graph Store (RESTful) HTTP Protocol
> ..snip..
>  I'm not in love with that; I just think RESTful is by far the
>  most important word in that title. I could even do without
>  SPARQL -- this has almost nothing to do with SPARQL. If I were
>  starting from scratch, I'd probably go with "RDF RESTful API".
> 
>  (In the REST world, people call these things "RESTful APIs", not
>  "protocols", in my experience.)
See my response to Andy who concurred with this change. Basically, I recall there was some hesitation for the use of the term RESTful

> abstract
> 
>  I think the title should explain the relationship between this
>  document and SPARQL. Maybe add a second sentence:
> 
>  This interface is essentially an alternative to the
>  SPARQL 1.1 Query and Update protocols; (nearly)
>  everything that can be done through this interface can
>  be done using that interface, but for some clients
>  and/or for some servers, this interface may be easier to
>  implement or work with.
> 
>  The "nearly" is because I don't think UPDATE gives a way to
>  generate a new graph URI as POST-to-the-dataset-URI SHOULD.
>  Trivial.
I added the following second sentence to the abstract: "This interface is an alternative to the SPARQL 1.1 Update protocol. Most of the operations defined here can be performed using that interface, but for some clients or servers, this interface may be easier to implement or work with." It doesn't seem to be an alternative to SPARQL Query.

> 1 introduction
> 
>  As with the abstract, I think a little more needs to be said at
>  the start about how this relates to the rest of SPARQL,
>  including perhaps explaining why this is even considered part of
>  SPARQL.
So, I think the connection between this protocol and SPARQL is the same as that between the update protocol and SPARQL. Once we began supporting write as well as read operations then the suite of specifications become more about the RDF dataset (or graph store) than the query language (SPARQL). So, the prefix SPARQL is a bit of a misnomer, but one that applies to all of the (new) specifications that provide capabilities beyond just reading. I feel that the additional sentence in the abstract describing the relationship with the update protocol emphasizes this without exacerbating the general naming issue we have.

>  maybe s/self-descriptive/self-describing/

This has been changed. 

>  I found the paragraph beginning, "It emphasizes..." pretty hard
>  to make sense of. If it's important to have this kind of
>  argument about how this is RESTful it would probably be clearer
>  to put it into the numbered list it follows. In item 1, we can
>  talk about constraint 1 and how it's met, etc.
I have made this change. The content of the paragraph (with the exception of the sentence beginning with "This specification relies..") has been moved into the numbered list above. 

>  s/an SPARQL Update equivalent/a SPARQL Update equivalent/

Changed. 

>  After the link to the XML Results format, we should have a link
>  to the JSON results format,
> http://www.w3.org/TR/sparql11-results-json/
This has been added. 

> 2 terminology
> 
>  "Resource - A network-accessible data object or service
>  identified by an IRI, as defined in [RFC2616]." But this
>  isn't the definition RDF uses. RDF-MT says "no assumptions are
>  made here about the nature of resources; 'resource' is treated
>  here as synonymous with 'entity', i.e. as a generic term for
>  anything in the universe of discourse." Perhaps the best we
>  can do is acknowledge this difference and then say it doesn't
>  matter for this spec, or that we're using the RFC2616 def'n in
>  this spec, unless we say "RDF Resource".
This has been incorporated into the definition for Resource. I went over the document to see everywhere the term 'resource' is used to ensure the RFC2616 sense of the word was the one intended and seems to be the case. In addition, there is no use of the word "RDF Resource". 

>  "RDF document - A serialization of an RDF Graph into a concrete
>  syntax." Maybe add "typically an RDF/XML or Turtle document."
Changed.
>  "Graph IRI" - I wish the definition used the word "dataset";
>  without it, it's not stated what the relationship is between the
>  IRI and the graph in the underlying stuff. We're left to
>  assume it the iri-graph pairing in the dataset.
My assumption all along has been that this document delegates the specifics of what a graph store is (and by extension, the relationship between a graph IRI and the graph in the graph store) to SPARQL Update. 
>  "RDF Graph content". I can't figure out how this is different
>  from "RDF Graph", or "Named Graph" (as the document uses the
>  term elsewhere, meaning the second element in a graph-naming
>  pair). So I'm confused by both the term (why not use "RDF
>  Graph"?), and the definition. Sorry. :-(
I'm not sure how to improve this definition. The first clause in that sentence ("An information resource identified by the graph IRI of a named graph..") distinguishes the 'thing' identified by the graph IRI from the named graph associated with it. 
>  "Implementations of this protocol are HTTP/1.1 servers [RFC2616]
>  MUST interpret request messages..." I think there's word
>  missing here. I can't parse the sentence. 
> Also, "Implementations of this protocol" doesn't seem quite
>  right; clients also implement this protocol, too. I think we
>  mean "Servers implementing this protocol", or "conforming
>  servers", or "SPARQL 1.1 Graph Store HTTP Protocol Servers".
>  Maybe we can introduce a term for these servers? "RESTful
>  Graph Stores" comes to mind.
I have rephrased and broken up the sentence, so it now reads: "Servers implementing this protocol are HTTP/1.1 servers [RFC2616] and MUST interpret request messages as graph management operations on an underlying Graph Store. The subject of the operation is indicated by the request IRI." 
>  (Which makes me think we're missing a conformance clause, as per
> http://www.w3.org/TR/qaframe-spec/#specifying-conformance
>  ... I'm not sure it matters.)
> 
> 3 protocol model
> 
>  s/DOS/Denial-of-Service/ (best to avoid acronyms)
Changed. 
> 4.1 graph identification
> 
>  Before we get into that, let's talk about URIs. I felt like I
>  was dumped into the middle of a conversation, missing all the
>  context. By the end of the document, I think I had it mapped
>  out. Did I get it right? 
Let's see. 
>  - There is a Service URI. This is used for:
>  - constructing indirect reference URIs, which is
>  necessary if the server doesn't serve all the Graph
>  IRIs, or if we want to access the default graph
>  - obtaining the Service Description, which we need in
>  order to find out the Dataset URI (see below). 
>  Does this have anything to do with a SPARQL service
>  endpoint address?
Unfortunately (and I do think this is unfortunate), The WG's decision to not discuss the relationship between service descriptions and the graph store protocol leaves no mechanism for discovering Graph store service URIs. This is underspecified. It was my assumption that deciding this essentially also means we have decided to leave this underspecified.


> The fact that I can get an SD by
>  doing a GET on it was my only clue that it probably is,
>  in fact, the same thing. Can we be quite explicit
>  about this, even if it's just to say RESTful Graph
>  Stores and SPARQL service endpoints MAY use the same
>  address? I know it gets complicated with ER, since one
>  dataset may have multiples EPs.
This was essentially one of the intentions of section 5.8 Graph Store Service Discovery (Informative). However, I have added a sentence to that section to that effect: "In addition, a service implementing this protocol MAY share the same address of a SPARQL endpoint."
>  - There is a Dataset URI. This is used for:
> 
>  - to ask the service to invent a new Graph IRI
Yes. 
>  When we have a multigraph syntax (eg TriG) standardized,
>  it seems clear to me that a GET of the Dataset URI would
>  return a complete dump of the dataset, and a PUT would
>  replace the dataset. Can we say something
>  forward-looking like this? I think so. Without this,
>  the Dataset URI seems pretty out-of-place here, used
>  only for this invent-a-new-Graph-IRI function. 
> Maybe,
>  in any case that function could be done, instead, via
>  POST to the Service URI? (That would be distinguished
>  from a QUERY or UPDATE operation by the mime type of the
>  POST.)
We had some conversation about this (http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0200.html), but I seem to recall the issue was how to make the forward reference and the discussion to not (properly) synch the service description spec and this one essentially ended that thread. Also, as alluded to in that thread, it would almost require a substantive change to extend the RESTful sense of the protocol to apply to the graph store as a whole in addition to just RDF graph content. Given these things, I don't feel comfortable spelling this out any further. This is something we probably discuss, but (from a procedural perspective) it seems too late to consider such a substantive change. 
>  Then we get into the Direct and Indirect identification URIs.
>  I suggest we start with some explanation of the two above URIs,
>  then, before we get into 4.1, we give a little overview of these
>  two, like:
> 
>  For a client to use this protocol to access individual
>  graphs in the graph store, it needs a URL for each
>  graph. Inside the store, each graph (except the
>  default graph) is labeled with a Graph IRI. In some
>  cases ("Direct Graph Identification"), those Graph IRIs
>  can be used (possibly after IRI-to-URI conversion) as
>  the URLs for HTTP access. In other cases ("Indirect
>  Graph Identification"), the Service URIs is used to
>  construct URLs for each graph. 
> 
I have added the following paragraph right before 4.1: "A client using this protocol to manipulate a graph store needs a IRI for each graph. Within the graph store, each graph (except the default graph) is associated with a graph IRI. In some cases ("Direct Graph Identification"), the graph IRIs can be directly used as the request URI of a graph management operation. In other cases ("Indirect Graph Identification"), the Service IRI is used to route the operations towards RDF graph content."

>  Which reminds me -- what happens if someone uses those Indirect
>  graph URLs as Graph IRIs in the same store? :-( Maybe
>  there's nothing helpful we can say about that. 
I don't think there is much we can say to clarify such a situation as the specifics have more to do with HTTP and URL processing than this protocol.
>  I'm not sure if you realize it, but it's quite possible these
>  indirect graph URLs will see a great deal of life outside of
>  this protocol -- for provenance and other metadata. They
>  provide a way to refer to a Graph Container inside a SPARQL
>  server. (This was discussed at the last RDF F2F.) To help
>  support this usage, it's probably worthwhile to strongly push
>  for Service URIs and SPARQL endpoint addresses to be the same.
I agree and I'm hoping the additional MAY statement in 5.8 is sufficient for this purpose. 
> 4.1 direct graph identification
> 
>  I think the first sentence needs to be qualified with a
>  "sometimes". 
I'm assuming you are referring to the fact that the resource identified by a graph IRI does not have a representation (even though one could straightforwardly be composed)? I have added a can: "resource, and the resource can have a representation that serializes"
> I'd start this section with a list of the`
>  situations in which one can use Direct.
> 
>  "Intuitively, the set of interpretations that satisfy [RDF-MT]
>  the RDF graph that the RDF document is a serialization of can be
>  thought of as this RDF graph content." uuuuuuummmm what?
>  Give me an example? Note the word "intuitively" (this is mostly meant to be informative) . Let us say http://metacognition.info implements this protocol and in the graph store it manages, there is a graph describing me and its IRI is http://metacognition.info/chimezie . The graph uses an RDF vocabulary with a semantics defined in an OWL file. Per RDF-MT works, the interpretations that satisfy the graph can be thought as the various configurations that assign meaning to the statements in the graph and in this case, since the graph describes me, they are the configurations whereby Chimezie is my name, Ohio is my state of residence, etc. An HTTP GET operation directed at that URI can be intuitively thought to be a request to fetch a representation of me as an RDF graph. This intuition is just suggesting that RDF model theory (which is about the meaning of a graph) provides (one way) of characterizing the referent of the graph IRI: as the meaning of the graph paired with the IRI. 

Also, I have been following the RDF WG discussions that have attempted to provide a semantic framework for named graphs and some of the suggestions I've seen so far have been along these lines (not directly).
> Or something? I can't make sense of
>  this. The "Graph Content" is a set of interpretations?
> 
>  The layout of the diagram seems off. There's a computer
>  labeling the arrow. I would expect one computer at each end of
>  the arrow. Plus it's got the MT stuff in it, which I don't see
>  the reason for. If you want I can try to draw the diagram as
>  I'm picturing it...
> 
>  Oh, maybe this MT stuff is because of ER...! If so, can we
>  should call that out explicitly, and try to hide the complexity
>  from people who don't care about it?
Given the MT stuff is just meant to be intuitive, I will remove it from the diagram.

>  "Any server that implements this protocol and receives a request
>  URI in this form SHOULD invoke the indicated operation..."
>  Instead of "invoke" can we say "perform"? I first thought
>  "invoke" meant it should pass it on to the server for that graph
>  (there might be one).
Changed 
>  "The embedded URI MUST be an absolute URI and the server MUST
>  respond with a 400 Bad Request if it is not." I think that's
>  too strict. I think there are some bits of the URI grammar that
>  folks sometime violate in their SPARQL graph IRIs. I know when
>  I've written RDF parsers that checked the syntax of the IRIs, I
>  had to turn off that checking when I hit other people's data.
>  Maybe things are better now.
So, the restriction here is mostly that the URIs cannot be relative since we decided not to support a Base URI resolution mechanism. 
>  (We should have some test cases about IRI/URI conversion for
>  this embedding.)
Yes, probably should.
>  "As will be discussed later in this document, both HTTP OPTIONS
>  and GET requests can be sent to the service and the response to
>  such a request is a service description document." But later
>  it's only a SHOULD. Do we mean that the Service MAY provide RDF
>  content, but if it does, that content MUST be an SD?
Are you referring to 5.8? It uses RECOMMENDED for both provisions.
> 5 graph management
> 
>  I'm a little hesitant about privileging RDF/XML like this. The
>  sense I get from the RDF WG is that in a year, 90% of the pure
>  RDF content on the Web (ie excluding RDFa and microdata), will
>  be Turtle, not RDF/XML. But, I don't really have a better
>  idea.
Andy also commented on this and I have switched to using turtle almost exclusively throughout the document. 
> 5.1 status codes
> 
>  "then the server should respond with a 400 Bad Request." Is
>  that supposed to be all-caps SHOULD?
Changed 
>  "should receive a response with a 405 Method Not Allowed".
>  Again, is that meant to be all-caps?
Changed. 
>  Most of the status codes are SHOULD, but two are MUST: 201
>  Created, and 404 Not Found -- but only on a DELETE. I'm
>  guessing these are editing errors, and should be SHOULD. 
The former was inherited/lifted from the HTTP 1.1 (RFC2616) text regarding PUT. The latter has been changed to SHOULD as it did not come from that specification.

> If
>  not, an explanation seems warranted. Personally, I'd lean
>  toward all these response codes being MUSTS. 
I basically tried to follow the usage from RFC2616
> I wonder about
>  pulling them out of the text into a separate decision table. I
>  guess there are a lot of meaningful response codes never
>  mentioned in this text, though.... I notice that RFC2616
>  mostly uses RFC2119 language in talking about what the client is
>  to do about these codes; it doesn't say the server
>  MUST/SHOULD/MAY send a 404, for instance. But it does, actually (in general), talk about response codes the server should send back. DELETE for example: "A successful response SHOULD be 200 (OK) if the response includes an entity describing the status, 202 (Accepted)"

> We could do the
>  same and just talk about the meaning of response codes.
> 5.2 http put
> 
>  "A request that uses the HTTP PUT method SHOULD store the
>  enclosed RDF payload as RDF graph content." How about: "A
>  request that uses the HTTP PUT method indicates the enclosed RDF
>  payload is to be stored as RDF graph content."
But the language here is more about server behavior than descriptive text about the meaning of the operations. 
>  The example here, and in 5.4 and 5.5 (but not 5.3) are a little
>  confusing in formatting. The required blank line after the
>  HTTP headers is missing
I have added the required blank line after all HTTP headers (where there is a body). 
> , but instead, after a blank line, we
>  have the SPARQL text for comparison. It's clear in 5.3 because
>  there is separating text. Borders around the examples would
>  solve the problem as well.
> 
>  "Either the request or the encoded URI (embedded in the query
>  component) identifies the RDF payload enclosed with the request
>  as RDF graph content." I don't think it does. Why would one
>  be trying to identify the payload...?
> Are we trying to say
>  something like, "To complete this operation, the Service MUST
>  store the given payload as the new content of the RDF graph
>  container labeled with the given Graph IRI." ? That's more in
>  the style of the current The server MUST NOT attempt to apply
>  the request to some other resource.RDF WG discussion, and less
>  in the style of the rest of this document; I'm not sure how to
>  write it in the existing style. 
This is from RFC 2616 (9.6 PUT): "In contrast, the URI in a PUT request identifies the entity enclosed with the request" 
>  "The server MUST NOT attempt to apply the request to some other
>  resource." Do we really need to say this? 
This is also from the same section. I guess in some cases restarting the RFC 2616 text is not necessary but it is hard to determine where it is informative to do so and where it is not.
> It kind of opens
>  the door, via "the exception proves the rule", to all sort of
>  other crazy behavior we didn't explicitly rule out. Maybe we
>  can give some example as to why this rule isn't obvious?
> 
>  "Developers should refer to [SPARQL-UPDATE] for the specifics of
>  how to handle empty graphs. In particular, if the request body
>  is empty and there is sufficient authorization to create a new
>  named graph with an IRI of that indicated by the request URI,
>  then an empty graph would need to be created." That's not how
>  I read UPDATE. I read UPDATE to be saying some
>  "implementations" keep track of empty graphs and some dont. For
>  those who don't, a PUT with a new graph IRI and an empty request
>  body has no effect.
Ok. I have added the clause "For implementations that support empty graphs," to replace 'In particular'
> 5.3 http delete
> 
>  I don't really like the human intervention bit, the fact that
>  success can be reported even if it's not done yet, and the
>  "inaccessible location" notion, but I see they are just copied
>  out of RFC 2616, ... so I'm not sure what to say.
Yes. This is the same general issue of on the one hand copying text from RFC 2616 to give some continuity to the content in this specification but (on the other) inheriting oddities from that specification (even if it is a normative reference).
> 5.4 http post
> 
>  "Within a service description document for an implementation of
>  this protocol, the URI of an instance of the sd:Dataset class is
>  understood to be the identifier of the Graph Store." I'm not
>  sure I'd call it the "identifier of the graph store"; see
>  earlier text about Dataset IRI.
I'm not sure if you are saying it would be more appropriate to call it the identifier of the dataset or something else
>  IMPORTANT POINT: it shouldn't
>  be the type sd:Dataset that matters; it should be the
>  sd:defaultDatasetDescription arc. I expect the type is
>  optional, but more importantly, we want to be able to merge SDs.
>  Where one gets the SD from matters for trust, but its meaning
>  isn't supposed to depend on which endpoint one gets it from.
I have changed this to "the object of an sd:defaultDataset statement" (this predicate has changed in the SD document recently) 
> other
> 
>  Shouldn't we say somewhere that all this applies to HTTPS as
>  well? It's obvious, of course.
I think of that as an obvious extension and independent of what the document is specifying.
> 
> I guess that's about it... Now, to try to get some REST. 
:) 



Chime Ogbuji
Sent with Sparrow
Received on Tuesday, 6 December 2011 05:47:51 UTC