- From: Sandro Hawke <sandro@w3.org>
- Date: Mon, 28 Nov 2011 23:54:17 -0500
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
SUMMARY: lots and lots of little suggestions, plus some major confusion
about URIs, a few issues with terminology, a problem with using RFC 2616
for response codes, and one problem with how SD is used.
All in all it's a very thorough document, and generally says what needs
to be said. My suggestions are generally about ways to make it clearer
to people (like me) who are not already expert in what it's saying. As
such, most of my suggestions probably wont matter to the hardcore
implementors who need it most -- they'll generally sort through it and
figure out what it means. I'm reading it more as someone who might be
trying to figure out what this is all about, and thinking about how to
make it clearer for them.
I reviewed:
http://www.w3.org/2009/sparql/docs/http-rdf-update/Overview.html
Revision 1.79 2011/11/16 01:46:26 cogbuji
title
I know we went thought a long WG decision process (twice) to
arrive at the current title, but in actually talking about this
document to a few people, I find the only way to have it make
any sense is to use the word "RESTful". So, I propose we
amend the title to:
SPARQL 1.1 Graph Store (RESTful) HTTP Protocol
I'm not in love with that; I just think RESTful is by far the
most important word in that title. I could even do without
SPARQL -- this has almost nothing to do with SPARQL. If I were
starting from scratch, I'd probably go with "RDF RESTful API".
(In the REST world, people call these things "RESTful APIs", not
"protocols", in my experience.)
abstract
I think the title should explain the relationship between this
document and SPARQL. Maybe add a second sentence:
This interface is essentially an alternative to the
SPARQL 1.1 Query and Update protocols; (nearly)
everything that can be done through this interface can
be done using that interface, but for some clients
and/or for some servers, this interface may be easier to
implement or work with.
The "nearly" is because I don't think UPDATE gives a way to
generate a new graph URI as POST-to-the-dataset-URI SHOULD.
Trivial.
1 introduction
As with the abstract, I think a little more needs to be said at
the start about how this relates to the rest of SPARQL,
including perhaps explaining why this is even considered part of
SPARQL.
maybe s/self-descriptive/self-describing/
I found the paragraph beginning, "It emphasizes..." pretty hard
to make sense of. If it's important to have this kind of
argument about how this is RESTful it would probably be clearer
to put it into the numbered list it follows. In item 1, we can
talk about constraint 1 and how it's met, etc.
s/an SPARQL Update equivalent/a SPARQL Update equivalent/
After the link to the XML Results format, we should have a link
to the JSON results format,
http://www.w3.org/TR/sparql11-results-json/
2 terminology
"Resource - A network-accessible data object or service
identified by an IRI, as defined in [RFC2616]." But this
isn't the definition RDF uses. RDF-MT says "no assumptions are
made here about the nature of resources; 'resource' is treated
here as synonymous with 'entity', i.e. as a generic term for
anything in the universe of discourse." Perhaps the best we
can do is acknowledge this difference and then say it doesn't
matter for this spec, or that we're using the RFC2616 def'n in
this spec, unless we say "RDF Resource".
"RDF document - A serialization of an RDF Graph into a concrete
syntax." Maybe add "typically an RDF/XML or Turtle document."
"Graph IRI" - I wish the definition used the word "dataset";
without it, it's not stated what the relationship is between the
IRI and the graph in the underlying stuff. We're left to
assume it the iri-graph pairing in the dataset.
"RDF Graph content". I can't figure out how this is different
from "RDF Graph", or "Named Graph" (as the document uses the
term elsewhere, meaning the second element in a graph-naming
pair). So I'm confused by both the term (why not use "RDF
Graph"?), and the definition. Sorry. :-(
"Implementations of this protocol are HTTP/1.1 servers [RFC2616]
MUST interpret request messages..." I think there's word
missing here. I can't parse the sentence.
Also, "Implementations of this protocol" doesn't seem quite
right; clients also implement this protocol, too. I think we
mean "Servers implementing this protocol", or "conforming
servers", or "SPARQL 1.1 Graph Store HTTP Protocol Servers".
Maybe we can introduce a term for these servers? "RESTful
Graph Stores" comes to mind.
(Which makes me think we're missing a conformance clause, as per
http://www.w3.org/TR/qaframe-spec/#specifying-conformance
... I'm not sure it matters.)
3 protocol model
s/DOS/Denial-of-Service/ (best to avoid acronyms)
4.1 graph identification
Before we get into that, let's talk about URIs. I felt like I
was dumped into the middle of a conversation, missing all the
context. By the end of the document, I think I had it mapped
out. Did I get it right?
- There is a Service URI. This is used for:
- constructing indirect reference URIs, which is
necessary if the server doesn't serve all the Graph
IRIs, or if we want to access the default graph
- obtaining the Service Description, which we need in
order to find out the Dataset URI (see below).
Does this have anything to do with a SPARQL service
endpoint address? The fact that I can get an SD by
doing a GET on it was my only clue that it probably is,
in fact, the same thing. Can we be quite explicit
about this, even if it's just to say RESTful Graph
Stores and SPARQL service endpoints MAY use the same
address? I know it gets complicated with ER, since one
dataset may have multiples EPs.
- There is a Dataset URI. This is used for:
- to ask the service to invent a new Graph IRI
When we have a multigraph syntax (eg TriG) standardized,
it seems clear to me that a GET of the Dataset URI would
return a complete dump of the dataset, and a PUT would
replace the dataset. Can we say something
forward-looking like this? I think so. Without this,
the Dataset URI seems pretty out-of-place here, used
only for this invent-a-new-Graph-IRI function. Maybe,
in any case that function could be done, instead, via
POST to the Service URI? (That would be distinguished
from a QUERY or UPDATE operation by the mime type of the
POST.)
Then we get into the Direct and Indirect identification URIs.
I suggest we start with some explanation of the two above URIs,
then, before we get into 4.1, we give a little overview of these
two, like:
For a client to use this protocol to access individual
graphs in the graph store, it needs a URL for each
graph. Inside the store, each graph (except the
default graph) is labeled with a Graph IRI. In some
cases ("Direct Graph Identification"), those Graph IRIs
can be used (possibly after IRI-to-URI conversion) as
the URLs for HTTP access. In other cases ("Indirect
Graph Identification"), the Service URIs is used to
construct URLs for each graph.
Which reminds me -- what happens if someone uses those Indirect
graph URLs as Graph IRIs in the same store? :-( Maybe
there's nothing helpful we can say about that.
I'm not sure if you realize it, but it's quite possible these
indirect graph URLs will see a great deal of life outside of
this protocol -- for provenance and other metadata. They
provide a way to refer to a Graph Container inside a SPARQL
server. (This was discussed at the last RDF F2F.) To help
support this usage, it's probably worthwhile to strongly push
for Service URIs and SPARQL endpoint addresses to be the same.
4.1 direct graph identification
I think the first sentence needs to be qualified with a
"sometimes". I'd start this section with a list of the
situations in which one can use Direct.
"Intuitively, the set of interpretations that satisfy [RDF-MT]
the RDF graph that the RDF document is a serialization of can be
thought of as this RDF graph content." uuuuuuummmm what?
Give me an example? Or something? I can't make sense of
this. The "Graph Content" is a set of interpretations?
The layout of the diagram seems off. There's a computer
labeling the arrow. I would expect one computer at each end of
the arrow. Plus it's got the MT stuff in it, which I don't see
the reason for. If you want I can try to draw the diagram as
I'm picturing it...
Oh, maybe this MT stuff is because of ER...! If so, can we
should call that out explicitly, and try to hide the complexity
from people who don't care about it?
"Any server that implements this protocol and receives a request
URI in this form SHOULD invoke the indicated operation..."
Instead of "invoke" can we say "perform"? I first thought
"invoke" meant it should pass it on to the server for that graph
(there might be one).
"The embedded URI MUST be an absolute URI and the server MUST
respond with a 400 Bad Request if it is not." I think that's
too strict. I think there are some bits of the URI grammar that
folks sometime violate in their SPARQL graph IRIs. I know when
I've written RDF parsers that checked the syntax of the IRIs, I
had to turn off that checking when I hit other people's data.
Maybe things are better now.
(We should have some test cases about IRI/URI conversion for
this embedding.)
"As will be discussed later in this document, both HTTP OPTIONS
and GET requests can be sent to the service and the response to
such a request is a service description document." But later
it's only a SHOULD. Do we mean that the Service MAY provide RDF
content, but if it does, that content MUST be an SD?
5 graph management
I'm a little hesitant about privileging RDF/XML like this. The
sense I get from the RDF WG is that in a year, 90% of the pure
RDF content on the Web (ie excluding RDFa and microdata), will
be Turtle, not RDF/XML. But, I don't really have a better
idea.
5.1 status codes
"then the server should respond with a 400 Bad Request." Is
that supposed to be all-caps SHOULD?
"should receive a response with a 405 Method Not Allowed".
Again, is that meant to be all-caps?
Most of the status codes are SHOULD, but two are MUST: 201
Created, and 404 Not Found -- but only on a DELETE. I'm
guessing these are editing errors, and should be SHOULD. If
not, an explanation seems warranted. Personally, I'd lean
toward all these response codes being MUSTS. I wonder about
pulling them out of the text into a separate decision table. I
guess there are a lot of meaningful response codes never
mentioned in this text, though.... I notice that RFC2616
mostly uses RFC2119 language in talking about what the client is
to do about these codes; it doesn't say the server
MUST/SHOULD/MAY send a 404, for instance. We could do the
same and just talk about the meaning of response codes.
5.2 http put
"A request that uses the HTTP PUT method SHOULD store the
enclosed RDF payload as RDF graph content." How about: "A
request that uses the HTTP PUT method indicates the enclosed RDF
payload is to be stored as RDF graph content."
The example here, and in 5.4 and 5.5 (but not 5.3) are a little
confusing in formatting. The required blank line after the
HTTP headers is missing, but instead, after a blank line, we
have the SPARQL text for comparison. It's clear in 5.3 because
there is separating text. Borders around the examples would
solve the problem as well.
"Either the request or the encoded URI (embedded in the query
component) identifies the RDF payload enclosed with the request
as RDF graph content." I don't think it does. Why would one
be trying to identify the payload...? Are we trying to say
something like, "To complete this operation, the Service MUST
store the given payload as the new content of the RDF graph
container labeled with the given Graph IRI." ? That's more in
the style of the current The server MUST NOT attempt to apply
the request to some other resource.RDF WG discussion, and less
in the style of the rest of this document; I'm not sure how to
write it in the existing style.
"The server MUST NOT attempt to apply the request to some other
resource." Do we really need to say this? It kind of opens
the door, via "the exception proves the rule", to all sort of
other crazy behavior we didn't explicitly rule out. Maybe we
can give some example as to why this rule isn't obvious?
"Developers should refer to [SPARQL-UPDATE] for the specifics of
how to handle empty graphs. In particular, if the request body
is empty and there is sufficient authorization to create a new
named graph with an IRI of that indicated by the request URI,
then an empty graph would need to be created." That's not how
I read UPDATE. I read UPDATE to be saying some
"implementations" keep track of empty graphs and some dont. For
those who don't, a PUT with a new graph IRI and an empty request
body has no effect.
5.3 http delete
I don't really like the human intervention bit, the fact that
success can be reported even if it's not done yet, and the
"inaccessible location" notion, but I see they are just copied
out of RFC 2616, ... so I'm not sure what to say.
5.4 http post
"Within a service description document for an implementation of
this protocol, the URI of an instance of the sd:Dataset class is
understood to be the identifier of the Graph Store." I'm not
sure I'd call it the "identifier of the graph store"; see
earlier text about Dataset IRI. IMPORTANT POINT: it shouldn't
be the type sd:Dataset that matters; it should be the
sd:defaultDatasetDescription arc. I expect the type is
optional, but more importantly, we want to be able to merge SDs.
Where one gets the SD from matters for trust, but its meaning
isn't supposed to depend on which endpoint one gets it from.
other
Shouldn't we say somewhere that all this applies to HTTPS as
well? It's obvious, of course.
I guess that's about it... Now, to try to get some REST.
-- Sandro
cf. ACTION-563
Received on Tuesday, 29 November 2011 04:54:28 UTC