- From: Sandro Hawke <sandro@w3.org>
- Date: Mon, 28 Nov 2011 23:54:17 -0500
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
SUMMARY: lots and lots of little suggestions, plus some major confusion about URIs, a few issues with terminology, a problem with using RFC 2616 for response codes, and one problem with how SD is used. All in all it's a very thorough document, and generally says what needs to be said. My suggestions are generally about ways to make it clearer to people (like me) who are not already expert in what it's saying. As such, most of my suggestions probably wont matter to the hardcore implementors who need it most -- they'll generally sort through it and figure out what it means. I'm reading it more as someone who might be trying to figure out what this is all about, and thinking about how to make it clearer for them. I reviewed: http://www.w3.org/2009/sparql/docs/http-rdf-update/Overview.html Revision 1.79 2011/11/16 01:46:26 cogbuji title I know we went thought a long WG decision process (twice) to arrive at the current title, but in actually talking about this document to a few people, I find the only way to have it make any sense is to use the word "RESTful". So, I propose we amend the title to: SPARQL 1.1 Graph Store (RESTful) HTTP Protocol I'm not in love with that; I just think RESTful is by far the most important word in that title. I could even do without SPARQL -- this has almost nothing to do with SPARQL. If I were starting from scratch, I'd probably go with "RDF RESTful API". (In the REST world, people call these things "RESTful APIs", not "protocols", in my experience.) abstract I think the title should explain the relationship between this document and SPARQL. Maybe add a second sentence: This interface is essentially an alternative to the SPARQL 1.1 Query and Update protocols; (nearly) everything that can be done through this interface can be done using that interface, but for some clients and/or for some servers, this interface may be easier to implement or work with. The "nearly" is because I don't think UPDATE gives a way to generate a new graph URI as POST-to-the-dataset-URI SHOULD. Trivial. 1 introduction As with the abstract, I think a little more needs to be said at the start about how this relates to the rest of SPARQL, including perhaps explaining why this is even considered part of SPARQL. maybe s/self-descriptive/self-describing/ I found the paragraph beginning, "It emphasizes..." pretty hard to make sense of. If it's important to have this kind of argument about how this is RESTful it would probably be clearer to put it into the numbered list it follows. In item 1, we can talk about constraint 1 and how it's met, etc. s/an SPARQL Update equivalent/a SPARQL Update equivalent/ After the link to the XML Results format, we should have a link to the JSON results format, http://www.w3.org/TR/sparql11-results-json/ 2 terminology "Resource - A network-accessible data object or service identified by an IRI, as defined in [RFC2616]." But this isn't the definition RDF uses. RDF-MT says "no assumptions are made here about the nature of resources; 'resource' is treated here as synonymous with 'entity', i.e. as a generic term for anything in the universe of discourse." Perhaps the best we can do is acknowledge this difference and then say it doesn't matter for this spec, or that we're using the RFC2616 def'n in this spec, unless we say "RDF Resource". "RDF document - A serialization of an RDF Graph into a concrete syntax." Maybe add "typically an RDF/XML or Turtle document." "Graph IRI" - I wish the definition used the word "dataset"; without it, it's not stated what the relationship is between the IRI and the graph in the underlying stuff. We're left to assume it the iri-graph pairing in the dataset. "RDF Graph content". I can't figure out how this is different from "RDF Graph", or "Named Graph" (as the document uses the term elsewhere, meaning the second element in a graph-naming pair). So I'm confused by both the term (why not use "RDF Graph"?), and the definition. Sorry. :-( "Implementations of this protocol are HTTP/1.1 servers [RFC2616] MUST interpret request messages..." I think there's word missing here. I can't parse the sentence. Also, "Implementations of this protocol" doesn't seem quite right; clients also implement this protocol, too. I think we mean "Servers implementing this protocol", or "conforming servers", or "SPARQL 1.1 Graph Store HTTP Protocol Servers". Maybe we can introduce a term for these servers? "RESTful Graph Stores" comes to mind. (Which makes me think we're missing a conformance clause, as per http://www.w3.org/TR/qaframe-spec/#specifying-conformance ... I'm not sure it matters.) 3 protocol model s/DOS/Denial-of-Service/ (best to avoid acronyms) 4.1 graph identification Before we get into that, let's talk about URIs. I felt like I was dumped into the middle of a conversation, missing all the context. By the end of the document, I think I had it mapped out. Did I get it right? - There is a Service URI. This is used for: - constructing indirect reference URIs, which is necessary if the server doesn't serve all the Graph IRIs, or if we want to access the default graph - obtaining the Service Description, which we need in order to find out the Dataset URI (see below). Does this have anything to do with a SPARQL service endpoint address? The fact that I can get an SD by doing a GET on it was my only clue that it probably is, in fact, the same thing. Can we be quite explicit about this, even if it's just to say RESTful Graph Stores and SPARQL service endpoints MAY use the same address? I know it gets complicated with ER, since one dataset may have multiples EPs. - There is a Dataset URI. This is used for: - to ask the service to invent a new Graph IRI When we have a multigraph syntax (eg TriG) standardized, it seems clear to me that a GET of the Dataset URI would return a complete dump of the dataset, and a PUT would replace the dataset. Can we say something forward-looking like this? I think so. Without this, the Dataset URI seems pretty out-of-place here, used only for this invent-a-new-Graph-IRI function. Maybe, in any case that function could be done, instead, via POST to the Service URI? (That would be distinguished from a QUERY or UPDATE operation by the mime type of the POST.) Then we get into the Direct and Indirect identification URIs. I suggest we start with some explanation of the two above URIs, then, before we get into 4.1, we give a little overview of these two, like: For a client to use this protocol to access individual graphs in the graph store, it needs a URL for each graph. Inside the store, each graph (except the default graph) is labeled with a Graph IRI. In some cases ("Direct Graph Identification"), those Graph IRIs can be used (possibly after IRI-to-URI conversion) as the URLs for HTTP access. In other cases ("Indirect Graph Identification"), the Service URIs is used to construct URLs for each graph. Which reminds me -- what happens if someone uses those Indirect graph URLs as Graph IRIs in the same store? :-( Maybe there's nothing helpful we can say about that. I'm not sure if you realize it, but it's quite possible these indirect graph URLs will see a great deal of life outside of this protocol -- for provenance and other metadata. They provide a way to refer to a Graph Container inside a SPARQL server. (This was discussed at the last RDF F2F.) To help support this usage, it's probably worthwhile to strongly push for Service URIs and SPARQL endpoint addresses to be the same. 4.1 direct graph identification I think the first sentence needs to be qualified with a "sometimes". I'd start this section with a list of the situations in which one can use Direct. "Intuitively, the set of interpretations that satisfy [RDF-MT] the RDF graph that the RDF document is a serialization of can be thought of as this RDF graph content." uuuuuuummmm what? Give me an example? Or something? I can't make sense of this. The "Graph Content" is a set of interpretations? The layout of the diagram seems off. There's a computer labeling the arrow. I would expect one computer at each end of the arrow. Plus it's got the MT stuff in it, which I don't see the reason for. If you want I can try to draw the diagram as I'm picturing it... Oh, maybe this MT stuff is because of ER...! If so, can we should call that out explicitly, and try to hide the complexity from people who don't care about it? "Any server that implements this protocol and receives a request URI in this form SHOULD invoke the indicated operation..." Instead of "invoke" can we say "perform"? I first thought "invoke" meant it should pass it on to the server for that graph (there might be one). "The embedded URI MUST be an absolute URI and the server MUST respond with a 400 Bad Request if it is not." I think that's too strict. I think there are some bits of the URI grammar that folks sometime violate in their SPARQL graph IRIs. I know when I've written RDF parsers that checked the syntax of the IRIs, I had to turn off that checking when I hit other people's data. Maybe things are better now. (We should have some test cases about IRI/URI conversion for this embedding.) "As will be discussed later in this document, both HTTP OPTIONS and GET requests can be sent to the service and the response to such a request is a service description document." But later it's only a SHOULD. Do we mean that the Service MAY provide RDF content, but if it does, that content MUST be an SD? 5 graph management I'm a little hesitant about privileging RDF/XML like this. The sense I get from the RDF WG is that in a year, 90% of the pure RDF content on the Web (ie excluding RDFa and microdata), will be Turtle, not RDF/XML. But, I don't really have a better idea. 5.1 status codes "then the server should respond with a 400 Bad Request." Is that supposed to be all-caps SHOULD? "should receive a response with a 405 Method Not Allowed". Again, is that meant to be all-caps? Most of the status codes are SHOULD, but two are MUST: 201 Created, and 404 Not Found -- but only on a DELETE. I'm guessing these are editing errors, and should be SHOULD. If not, an explanation seems warranted. Personally, I'd lean toward all these response codes being MUSTS. I wonder about pulling them out of the text into a separate decision table. I guess there are a lot of meaningful response codes never mentioned in this text, though.... I notice that RFC2616 mostly uses RFC2119 language in talking about what the client is to do about these codes; it doesn't say the server MUST/SHOULD/MAY send a 404, for instance. We could do the same and just talk about the meaning of response codes. 5.2 http put "A request that uses the HTTP PUT method SHOULD store the enclosed RDF payload as RDF graph content." How about: "A request that uses the HTTP PUT method indicates the enclosed RDF payload is to be stored as RDF graph content." The example here, and in 5.4 and 5.5 (but not 5.3) are a little confusing in formatting. The required blank line after the HTTP headers is missing, but instead, after a blank line, we have the SPARQL text for comparison. It's clear in 5.3 because there is separating text. Borders around the examples would solve the problem as well. "Either the request or the encoded URI (embedded in the query component) identifies the RDF payload enclosed with the request as RDF graph content." I don't think it does. Why would one be trying to identify the payload...? Are we trying to say something like, "To complete this operation, the Service MUST store the given payload as the new content of the RDF graph container labeled with the given Graph IRI." ? That's more in the style of the current The server MUST NOT attempt to apply the request to some other resource.RDF WG discussion, and less in the style of the rest of this document; I'm not sure how to write it in the existing style. "The server MUST NOT attempt to apply the request to some other resource." Do we really need to say this? It kind of opens the door, via "the exception proves the rule", to all sort of other crazy behavior we didn't explicitly rule out. Maybe we can give some example as to why this rule isn't obvious? "Developers should refer to [SPARQL-UPDATE] for the specifics of how to handle empty graphs. In particular, if the request body is empty and there is sufficient authorization to create a new named graph with an IRI of that indicated by the request URI, then an empty graph would need to be created." That's not how I read UPDATE. I read UPDATE to be saying some "implementations" keep track of empty graphs and some dont. For those who don't, a PUT with a new graph IRI and an empty request body has no effect. 5.3 http delete I don't really like the human intervention bit, the fact that success can be reported even if it's not done yet, and the "inaccessible location" notion, but I see they are just copied out of RFC 2616, ... so I'm not sure what to say. 5.4 http post "Within a service description document for an implementation of this protocol, the URI of an instance of the sd:Dataset class is understood to be the identifier of the Graph Store." I'm not sure I'd call it the "identifier of the graph store"; see earlier text about Dataset IRI. IMPORTANT POINT: it shouldn't be the type sd:Dataset that matters; it should be the sd:defaultDatasetDescription arc. I expect the type is optional, but more importantly, we want to be able to merge SDs. Where one gets the SD from matters for trust, but its meaning isn't supposed to depend on which endpoint one gets it from. other Shouldn't we say somewhere that all this applies to HTTPS as well? It's obvious, of course. I guess that's about it... Now, to try to get some REST. -- Sandro cf. ACTION-563
Received on Tuesday, 29 November 2011 04:54:28 UTC