Re: protocol 1.1 review (to 2.3)

On 8/6/2011 6:23 PM, Andy Seaborne wrote:
>
>
> On 06/08/11 21:48, Lee Feigenbaum wrote:
>> Thanks very very much, Andy. (The changes mentioned below should be
>> reflected in CVS.)
>
>
>>> == Introduction
>>> Need to mention the different SPARQL 1.1 Graph Store HTTP Protocol.
>>>
>>> Add para:
>>> """
>>> A separate document describes the SPARQL 1.1 HTTP Graph Store Protocol.
>>> [links] for accessing and managing a collection of graphs in the REST
>>> architectural style.
>>> """
>>
>> I've taken this for now but might delegate to the overview document
>> eventually.
>
> Minor: I still thing it needs a sentence so when read independently,
> it's clear this is not the graph protocol.

OK.

>>> == Section 2:
>>> General:
>>> There is use of "MUST" and "encode" that are talking about HTTP.
>>> This doc is not defining that requirement, it comes from HTTP.
>>>
>>> Many people will be using HTTP through a library that may well handle
>>> many of the issues.
>>>
>>> Suggest only using MUST for things SPARQL defines.
>>
>> It's not clear to me what to reference to include these bits
>> normatively. The must requirements also tend to go a bit beyond what
>> HTTP might mandate, such as specifying the names of the parameters.

>>> == 2.1.1
>>>
>>> "client MUST URL encode"
>>> -->
>>> Not MUST as it's an HTTP requirement.
>>
>> The MUST refers to the full rest of the sentence, not just this phrase.
>
> Sure - but it could be read a SPARQL requires an encoding - then HTTP is
> going to encode it. i.e. it's a SPARQL-MUST.
>
> I think MUST etc should (:-) only be used when it's this spec defining
> the requirement. The requirement here is "use HTTP correctly".
>
> "and MUST include" makes it clear as to MUST scope.

OK, I see. You're concerned that this could be read as demanding 
double-encoding. I see.

I still don't see where in RFC2616 or elsewhere the standard way to 
build up a URI query string from key/value pairs is defined. Do you know 
where it is? Otherwise, I can play with the text to make it very 
explicit that this is not requiring double URL encoding. Maybe I can 
(ironically) check and see how the WSDL HTTP bindings specify this same 
thing.

> Additional: 2.1.2
> "When using this method"
> ==>
> "When using this POST method"
> 2.1.3 ditto.
>
> Is "this" the HTTP method of the subject of the section (SPARQL "method"
> i.e. form of POST).

"method" here is supposed to be that particular version of the SPARQL 
protocol. It's confusing. I've changed it by s/method/approach.

>>> "The HTTP operation will need to be encoded according to the rules of
>>> RFC 2161."
>>
>> I'm still not sure if this is right. Maybe I just am doing bad searches,
>> but I can't find talk in http://tools.ietf.org/html/rfc2616 of how to
>> build a request URI query string from a set of key/value pairs.
>>
>>> The examples will show this.
>>>
>>> "Query string parameters MUST be"
>>> A library may well be sorting this out.
>>
>> That's ok, but irrelevant to our spec, right?
>
> As above - risk of meaning that SPARQL-encodes ... then HTTP is going to
> encode again.

I understand now, but still not sure what to reference. I don't see it 
anywhere in HTTP, and the HTML form stuff is close but very specific to 
forms & documents, so wouldn't be an appropriate reference.

>>> == 2.1.2
>>>
>>> There are two sub-cases: HTML form encoding and POST of a query string.
>>> I think you can mix as well, although it's rare.
>>>
>>> The section is actually about HTML form encoding, but says
>>> "URL-encoding".
>>
>> I don't understand what you mean here. Forms use URL encoding.
>
> Text is:
>
> "clients must URL encode all parameters and include them"
>
> Read "and" as do one thing, do next thing" i.e. in the sense of "and
> then include ..." and it's double encoding.

Right, you're reading "include" as including all of the HTTP baggage 
that goes along with adding something to a URI query string. Whereas I 
was intending "include" as in "string concatenation". Will work on this 
once we figure out if there's something appropriate to reference
>>> == 2.1.4
> ...
>>> Ideally, the fact that the datset can be determined by endpoint would be
>>> first. It's the common case.
>>
>> I'm not sure which is the common case, but I don't really care about the
>> order so I changed it around for you. :)
>
> I wasn't clear - the common case is service provided dataset (maybe not
> in Anzo's case but overall more common as I've seen services).
>
> I suggest moving the last para to the first.
>
> """
> A SPARQL query is executed against an RDF Dataset. If an RDF dataset is
> not specified in either the protocol request of the SPARQL query string,
> then implementations may execute the query against an
> implementation-defined default RDF dataset. (@@ref to SD?)
>
> The RDF dataset for a query may be specified either via the
> default-graph-uri and named-graph-uri parameters in the SPARQL Protocol
> or in the SPARQL query string using the FROM and FROM NAMED keywords.
>
> If different RDF datasets are specified in both the protocol request and
> the SPARQL query string, then the SPARQL service must execute the query
> using the RDF dataset given in the protocol request. Note that a service
> may reject a query with response code 400 if the service does not allow
> protocol clients to specify the RDF dataset.
> """

I prefer it the way it is now. I don't think we ought to emphasize the 
implementation-defined way by placing it first. It seems confusing to me.

>>> === 2.1.7
>
>>> 400 is also the right code for e.g. a query with FROM when the processor
>>> only accepts queries against the implicit dataset. It's a client error.
>>> It's not just bad syntax.
>>
>> Yeah? Isn't 400 for "malformed" requests?
>
> 4xx is client error.
>
> Supplying a datasets description to a service endpoint that does support
> a dataset description is a client error (this is SPARQL 1.0
> protocol-ness) not a server error.

I have to squint pretty hard to see it as a client error, actually. But 
I don't feel that strongly, anyway :)

> 400 is more just parse error. "malformed request" is the best we can do
> for request mistakes the client.

HTTP defines this pretty clearly:

"""
The request could not be understood by the server due to malformed syntax.
"""

...so I really think 400 is _just_ syntax.

HTTP introduces 5xx error codes this way:

"""
Response status codes beginning with the digit "5" indicate cases in 
which the server is aware that it has erred or is incapable of 
performing the request.
"""

To me, the case in question (supplying a dataset description to a 
service endpoint that won't take one) is a case of the server being 
aware that it is incapable of performing the request.

>>> == 2.2.2
> ...
>>> "as query string parameters"
>>> This is update, not query.
>>>
>>
>> Yes, but the part of the URI after the ? is still the "query string".
>> It's confusing, but I'm pretty sure it's proper terminology? I've
>> changed it to
>>
>> "as URI query parameters"
>
> or "HTTP query string parameters".
>
> Looking back, I see same in 2.1.2 and 2.1.3.

OK, got it.


>>> == 2.2.4, 2.2.5
>>> See 2.1.6, 2.1.7
>>>
>>> Stronger text about the response to a successful update operation being
>>> (by spec) empty. There is no response to a update operation defined in
>>> the update language. I'm worried that leaving it implementation defined
>>> might lead to expectations of query responses from an update.
>>>
>>> This is a spec - just don't say anything about implementation defined
>>> features.
>>
>> The text that's there now is specifically in response to comments &
>> discussion on the mailing list. As normative spec text, this text
>> doesn't change anything, but it does set some expectations in a way that
>> I think matches the group's intention, at least from the last discussion
>> spawned from the -comments mailing list.
>
> Wasn't that mainly about errors? I was focusing on successful updates.

I think you are right here. Though don't know if we've discussed the 
other case. Maybe we can discuss it quickly on Tuesday's teleconference.

Lee

>> If the group prefers that the spec says that response bodies SHOULD be
>> empty, we can do that. I don't think it should say that response bodies
>> MUST be empty, as that goes against what some implementations do and
>> what multiple community comments have requested.
>
> Fine - certainly can't have "MUST be empty".
> I just want to slant it towards "no body"; a successful update request
> does not need a non-empty body.
>
> """
> The response body of a successful update request is not defined in this
> specification. An implementation may include content deemed useful,
> either to end users or to the invoking client application.
> """
>
> Andy
>

Received on Sunday, 7 August 2011 17:16:23 UTC