Re: SPARQL WG open comments from Paul Gearon on 2011-11-14 (public-rdf-dawg@w3.org from October to December 2011)

From: Paul Gearon <gearon@ieee.org>
Date: Mon, 14 Nov 2011 17:28:07 -0500
To: Axel Polleres <axel.polleres@deri.org>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <CAGZNPFnJbYjdA1z97fqdDy++Y5azrozEA41G4GpDgFEmoNk_cA@mail.gmail.com>
I'm adding the WG to the addressees here, as my original reply was
accidentally only sent to the chairs. Consequently I'm leaving large
blocks of text so that context is clear for anyone who wants to catch
up on the thread. I apologise in advance if this makes it hard to read
the email.

After illness and just returning from a conference I am trying to
catch up on the issues that Lee and Axel have asked me on. Responses
to the most recent message from Axel are below.

On Wed, Nov 9, 2011 at 3:23 AM, Axel Polleres <axel.polleres@deri.org> wrote:
> Hi Paul,
>
>> Was deathly ill last week and am trying
>> to catch up right now before I have to travel in the morning (so I
>> can't make the call tomorrow either).
>
> Hope your better again!
>
>> I will be able to update the wiki when
>> I get back.
>
> Please let us know when you have it on the wiki on the list, if you could make it before the next meeting, that'd be awesome.
> more answers inline below...
>
> On 8 Nov 2011, at 06:47, Paul Gearon wrote:
>
>> Hi Lee,
>>
>> Sorry I've been out of touch. Was deathly ill last week and am trying
>> to catch up right now before I have to travel in the morning (so I
>> can't make the call tomorrow either).
>>
>> More comments below:
>>
>> > On 11/1/2011 9:08 PM, Lee Feigenbaum wrote:
>> >>>> Hi Paul,>>>> On today's SPARQL call, we went through all of our open comments. In the>> next few weeks, we'll be looking to close all of these comments as we>> wrap up last call. We will also have a 2nd last call period that will>> run from the beginning of December until January.>>>> To that end, we need to identify ASAP any substantive changes to our>> documents that will need to be included in a 2nd Last Call.>>>> Today, we identified the following comments that have not yet been>> responded to that are your responsibility (or are partially your>> responsibility, but need input from update to add to what's already>> there from query). Could you please try to find some time in the next>> few weeks to address these comments?>>>> RC-4>> DB-4>> DB-5
>>
>>
>> I've done a little with those comments, though  I haven't had it
>> together enough to post. I am including everything I have here so that
>> you'll have it for the meeting. I will be able to update the wiki when
>> I get back.
>
>>
>> For DB-4, the section:
>>
>> > 3. http://www.w3.org/TR/sparql11-update/#graphStore
>> > "Operations may specify graphs to work with, or they may rely on a
>> > default graph for that operation."
>> > But don't operations use RDF Datasets, rather than graphs?
>>
>> <AndyS> This is not about query despite the subject line :-)
>>
>> Operations query RDF Datasets, and make modifications to graphs. I
>> suggest changing the words "work with" to "be modified".

I've changed the Update document to reflect this.


>> > 4. Regarding the typographical conventions that you use for conformance
>> > keywords:
>> > [[
>> > When this document uses the words must, must not, should, should not,
>> > may and recommended, and the words appear as emphasized text, they must
>> > be interpreted as described in RFC 2119 [RFC2119].
>> > ]]
>> > As you can see in the excerpt quoted above, the typographical emphasis
>> > of these keywords (i.e., bolding) is lost when the document is viewed as
>> > plain text or copied and pasted as plain text.  This makes it more
>> > difficult to quote and discuss portions of the specification precisely.
>> > To ensure clarity, please make these keywords UPPER CASE (perhaps in
>> > addition to being bold).
>>
>> I wanted to be consistent, so I looked at SPARQL 1.1 Protocol for RDF.
>> I was under the assumption that there must have been some uniformity.
>> However, I see that the Graph Store HTTP Protocol doc uses
>> capitalization, and Service Description uses capitalization and a
>> style, while none of the other documents refer to RFC 2119 at all.
>>
>> If David's concerns make sense, then I can change it, though I
>> recommend a standard approach be used by the other documents that
>> reference RFC 2119.
>
> I guess we could think about that (using uniformly capitalization & bold, for instance).

I have left these words bolded, but have also added capitalization. If
a standard is adopted between documents then I will change to that.


>> For DB-5
>>
>> > 1. Please either add capability for virtual graphs or keep the COPY, ADD
>> > and MOVE shortcuts, to enable standard SPARQL to be used more
>> > efficiently as a rules language and in data production pipelines.  COPY,
>> > ADD and MOVE operations cost almost nothing to implement, and they help
>> > with efficiency.  By "virtual graph" I mean a graph that consists of the
>> > merge of a particular set of named graphs -- a very important capability
>> > for efficient data production pipelines.
>>
>> I support full acceptance of these features. I do not support
>> introducing virtual graphs into SPARQL 1.1.
>
> +1 (personal opinion)

This needs input from the working group before it can be endorsed, but
I'm happy to make the appropriate change as soon as we can.


>> > 2. This paragraph in sec 3.1.3 is a bit confusing:
>> > [[
>> > That is, the GroupGraphPattern in the WHERE clause will be matched
>> > against the dataset described by explicit USING or USING NAMED clauses,
>> > if specified, and against the graph store otherwise. Any graph name
>> > specified in a WITH clause will - for evaluating the WHERE clause -
>> > refer to the default graph to be used in the absence of USING or USING
>> > NAMED clauses. In the presence of one or more graphs referred to in
>> > USING clauses, the default graph will be the merge of these graphs,
>> > meaning that the graph in a WITH clause will be ignored while evaluating
>> > the WHERE clause. If there is no USING clause, but there is one or more
>> > USING NAMED clauses, then the dataset will include an empty graph for
>> > the default graph.
>> > ]]
>> > In particular, the sentence "Any graph name specified in a WITH clause
>> > will - for evaluating the WHERE clause - refer to the default graph to
>> > be used in the absence of USING or USING NAMED clauses." seems odd.  The
>> > graph specified in the WITH clause will refer to the *default* graph?  I
>> > would think it would be used *instead* of the default graph.  Isn't that
>> > the point of WITH?  Perhaps the term "default graph" is being used in an
>> > unusual way in this paragraph, to mean "the graph that will used in the
>> > absence of USING or USING NAMED"?  I think it would be misleading to
>> > call that a "default graph".  Normally the term "default graph" refers
>> > to the unnamed slot in a Graph Store, per the first paragraph in section
>> > 2.  I think it would be best to use the term only in that way.
>>
>> He's right, in that it's confusing.
>>
>> One problem is that there are 2 types of "default" graph. There's the
>> default graph for the store, and the default graph in a query. For
>> instance, a query that says "select * {?s ?p ?o}" gets data from the
>> "default" graph, but the protocol can set this graph to be anything,
>> by using the default-graph-uri parameter. If this hasn't been set,
>> then the query will refer to the default graph of the store (the
>> default-default graph).
>>
>> Paragraph 3.1.3 is referring to the default graph of a query, while
>> David is referring to the default graph of a store, hence his
>> confusion.
>>
>> Other than that, the text makes sense, if you know what it's supposed
>> to mean. However, I suspect that if you don't already know what it's
>> trying to say, then it may be a bit impenetrable. Does it need to be
>> changed?
>
> Could you propose some clarifying modification or addition?

I have not had a chance to consider this since last week. I will try
to come up with something by the telecon tomorrow.


>> > 3. In searching for the definition of the backslash "\" symbol in
>> > section 4.2, it looks like it is supposed to be set difference, ...
>>
>> Axel?
>
> yes, that's what it is supposed to mean... Do you think we should replace it with 'set-difference', or say that '\' denotes set-difference?
> (both ok for me, it's anyways just an editorial change).

Just explain that it means set difference. I was OK with it meaning
that, but it just needs to be defined.


>> > 4. The difference between "USING" and "USING NAMED" is not explained,
>> > except in passing: "This describes a dataset in a manner similar to FROM
>> > and FROM NAMED clauses in the SPARQL1.1 Query Language."
>>
>> This does not appear to be "in passing" to me.
>
>>
>> The INSERT/DELETE operations (the only ones that use USING and USING
>> NAMED) operate as a query-and-update. The query is basically exactly
>> the same as what is described in SPARQL 1.1 Query Language. However,
>> to avoid confusion in "deleting from" a graph, we opted to avoid the
>> use of the keyword FROM and replace it with USING.
>>
>> Does someone suggest better wording to avoid David's concern?
>>
>
> We could replace "in a manner similar to FROM and FROM NAMED" with
> "in the same way as FROM and FROM NAMED" and maybe add a direct link to
>  http://www.w3.org/TR/sparql11-query/#specifyingDataset
> Ok?

Done.


>> > 5. As written, this in sec 3.1:
>> > http://www.w3.org/TR/sparql11-update/#graphUpdate
>> > [[
>> > Graph update operations change existing graphs in the Graph Store but do
>> > not explicitly delete nor create them. Non-empty inserts into
>> > non-existing graphs will, however, implicitly create those graphs, i.e.,
>> > an implementation *should* create graphs that do not exist before
>> > triples were inserted into them (there may be implementations providing
>> > an update service over a fixed set of graphs which in such case *must*
>> > return with failure for update requests that would create an unallowed
>> > graph), and *may* remove graphs that are left empty after triples are
>> > removed from them.
>> > ]]
>> > seems to say that an implementation that operates over a *variable*
>> > (non-fixed) set of graphs still has the option of not automatically
>> > creating graphs that do not exist.
>> >
>> > I suggest rewording the above portion as:
>> > [[
>> > Graph update operations change existing graphs in the Graph Store but do
>> > not explicitly delete nor create them. Non-empty inserts into
>> > non-existing graphs will normally implicitly create those graphs, i.e.,
>
> I still like ", however, implicitly" better than "normally implicitly"
>
>> > an implementation fulfilling an update request *should* silently and
>> > automatically create graphs that do not exist before triples are
>> > inserted into them, and *must* return with failure if it fails to do so
>> > for any reason.  (For example, the implementation may have insufficient
>> > resources, or an implementation may only provide an update service over
>> > a fixed set of graphs
>
> where the implicitly created graph is not within this fixed set
>
>> .)  An implementation *may* remove graphs that are
>> > left empty after triples are removed from them.
>> > ]]
>>
>> (similar suggestion for point 6)
>>
>> David's rewording does seem a little better, and I'm happy to incorporate it.
>>
>
> see my suggested addition/modification above. Otherwise ok with that change.

Done.


>> Point 7 is similar, but I prefer the original text.
>>
>>
>> > 8. How is the URI of a Graph Store indicated?  The concept of a Graph
>> > Store is central to the SPARQL 1.1 Update spec, and hence one should be
>> > able to use a URI to refer to a particular Graph Store, but the spec
>> > doesn't say how this is done.
>> <further discussion on this>
>>
>> I don't have an answer to this one. Whenever I've used an RDF store
>> the documentation for that software has always told me the form of the
>> URI for the store. I've never seen it defined in any way, but then,
>> I've never really needed it to be. Suggestions?
>>
>
> I'd say the following:
>
> The information how a graph store is accessed is defined in the protocol and graph store protocol specs:
> A graph store is accessible by either an update service (cf. protocol) or via the graph store protocol (cf. graph store protocol),
> in any case it is hidden behind the service, so it's accessible via the URI of a SPARQL
> update service or via a URI that responds to the graph store protocol.

This works for me, though I changed the wording a little:

"The information how a graph store is accessed is defined in the
protocol and graph store protocol specs. A graph store is accessible
by either an update service (cf. protocol) or via the graph store
protocol (cf. graph store protocol). In either case the graph store is
hidden behind the service, making it accessible via the URI of a
SPARQL update service or via a URI that responds to the graph store
protocol."


>> -- On RC-4 --
>>
>> I agree with Andy's comments. I also wanted to note that Richard's
>> example is invalid, in that it would not be legal on a query endpoint.
>>
>> Richard also says:
>> "I am surprised that the security issues arising from obfuscation
>> through string escaping are not stated in the Security Considerations
>> sections of SPARQL Query and SPARQL Update."
>>
>> I do not consider this to be an issue, as it is only users with update
>> permissions who will be successfully issuing update operations.
>>
>> There is the potential for this to be an issue for a system that wants
>> to create a fine-grained permissions scheme (for instance, allowing
>> insertion, but not removal). Is this a concern worth documenting?

This was a question for the larger group. Should I add something for
systems that are in this category?


>> Andy comments that a WG decision is needed for the following:
>>
>> >> • As part of the changes to the escape processing model for \u escapes,
>> >> additional characters (e.g. "=", ",") would be allowed, in \u escaped form,
>> >> in prefixed names.
>>
>> > I oppose this change, as there is no use case for it. Prefixed names are a
>> > convenience for authors to make long IRIs easier to write and read. Escapes
>> > like \u003D and \u002C are neither easy to write nor easy to read, so they
>> > defeat the purpose of prefixed names. IRIs that include such characters just
>> > have to be written as absolute or relative IRIs.
>>
>> Richard is quite right here. However \u unescaping is something that
>> an implementor would likely want to do before parsing. Singling out
>> the prefixes so that they are not treated this way would require
>> parsing them out, then unescaping the remainder of the text, and then
>> continuing the parsing.
>>
>> Unless there is a good reason to require that prefixes not allow
>> escaping, then I would prefer to keep escaping on the entire text.
>
> Shall we discuss this in one of the upcoming TelCos?

Sure, though I'm in favour of the status quo.

Regards,
Paul Gearon
Received on Monday, 14 November 2011 22:28:40 UTC