Re: [Fwd: Last Call for comments on "SPARQL Query Language for RDF"] from Axel Polleres on 2007-04-16 (public-rif-wg@w3.org from April 2007)

From: Axel Polleres <axel.polleres@deri.org>
Date: Mon, 16 Apr 2007 20:24:13 +0100
To: "Public-Rif-Wg (E-mail)" <public-rif-wg@w3.org>
Message-id: <4623CD5D.10206@deri.org>
p.s.: there is one more comment I would like to add to the review:


Section 12.6

"(c) These conditions do not impose the SPARQL requirement that SG share 
no blank nodes with AG or BGP. In particular, it allows SG to actually 
be AG. This allows query protocols in which blank node identifiers 
retain their meaning between the query and the source document, or 
across multiple queries. Such protocols are not supported by the current 
SPARQL protocol specification, however."

Note that this seems to be a bit worrying to me. It seems to suggest 
that extensions of SPARQL allow to treat BNodes different to existential 
variables, which is what would become possible if you allow them to 
retain their meaning over stacked queries. I am a bit worried, if this 
"backdoor" is really compatible with the intention of bnodes in RDF.



Axel Polleres wrote:
> 
> Dear all,
> 
> below my review on the current SPARQL draft, see
> 
> http://www.w3.org/TR/rdf-sparql-query/

> 
> 
> Generally, I think
> 
> 1) the formal definitions have improved a lot, but still I am at the 
> same time not 100% sure that all definitions are formally water-proof. 
> This affects mainly questions on Section 12 and partly unclear 
> Definitions/pseudocode algorithms for query evaluation therein.
> 
> 2) Maybe we can borrow things like the handling of IRIs from them, as 
> already mentioned by Dave
> http://lists.w3.org/Archives/Public/public-rif-wg/2007Apr/0015.html

> 
> 
> Please let me know if you have anything to add/modify!
> As they were asking for feedback by April 18th, I would be glad if I'd 
> get the ok from RIF to send that off in April 17th TeleConf.
>  Please make it clear in the notes, since unfortunately, due to a 
> project meeting, I have to send regrets for that TeleConf.
> 
> best,
> axel
> 
> 
> -------
> 
> Detailed comments:
> 
> 
> Prefix notation is still not aligned with turtle. Why?
> Would it make sense to align with turtle and
> use/allow '@prefix' instead/additionally to 'PREFIX'
> 
> 
> Section 4.1.1
> 
> The single quote seems to be missing after the table in sec 4.1.1
> in "", or is this '"'?
> 
> Section 4.1.4
> 
> The form
> 
> [ :p "v" ] .
> 
> looks very awkward to me!
> 
> I don't find the grammar snippet for ANON very helpful here, without
> explanation what WS is...  shouldn't that be a PropertyListNotEmpty 
> instead?
> 
> 
> Section 5
> 
> Section 5 is called Graph patterns and has only subsections
> 5.1 and 5.2 for basic and group patterns, whereas the other types are
> devoted separate top level sections.. this structuring seems a bit
> unlogical.
> 
> 
> Why the restriction that a blank node label can only be used in a single
> basic graph pattern? And if so, isn't the remark that the scope is the
> enclosing basic graph pattern redundant?
> 
> Why here the section about "extending basic graph pattern matching",
> when not even basic graph pattern matching has been properly introduced
> yet? If you want to only informally introduce about what matching you
> talk here, then I'd call section 5.1.2 simply "Basic Graph Pattern
> Matching" but I think I'd rather suggest to drop this section.
> 
> 
> 
> "with one solution requiring no bindings for variables"
> -->
> rather:
> "with one solution producing no bindings for variables"
> or:
> "with one solution that does not bind any variables"
> 
> Section 5.2.3
> 
> Why you have a separate subsection examples here? It seems 
> superfluous/repetitive. Just put the last example, which seems to be the 
> only new one, inside Sec 5.2.1 where it seems to fit, and drop the two 
> redundant ones. For the first one, you
> could add "and thatbasic pattern consists of two triple patterns" to the
> first example in sec 5.2, for the second one, add the remark that "the
> FILTER does notbreak the basic graph pattern into two basic graph
> patterns" to the respective exaple in section 5.2.2.
> 
> 
> 
> Section 6:
> 
> One overall question which I didn't sort out completely so far:
> What if I mix OPTIONAL with FILTERs?
> 
> ie.
> 
> {A OPTIONAL B FILTER F OPTIONAL C}
> 
> is that:
> 
> {{A OPTIONAL B} FILTER F OPTIONAL C}
> 
> or rather
> 
> {{A OPTIONAL B FILTER F} OPTIONAL C}
> 
> and: would it make a difference? I assume no, the filter is, in both
> cases at the level of A, but I am not 100% sure. Maybe such an example 
> owuld be nice to have...
> 
> 
> Another one about FILTERs: What about this one, ie. a FILTER which
> refers to the outside scope:
> 
> ?x p o OPTIONAL { FILTER (?x != s) }
> 
> concrete example:
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n .
>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
> 
> Supresses the email address for John Doe in the output!
> Note: This one is interesting, since the OPTIONAL part may NOT be 
> evaluated separately!, but carries over a binding from the super-pattern!
> 
> Do you have such an example in the testsuite? It seem that the last 
> example in Seciton 12.2.2 goes in this direction, more on that later
> 
> Would it make sense to add some non-well-defined OPTIONAL patterns,
> following [Perez et al. 2006] in the document? As mentioned before, I
> didn't yet check section 12, maybe these corner case examples are there..
> 
> 
> Section 7:
> 
> Why "unlike an OPTIONAL pattern"? This is comparing apples with pears...
> I don't see the motivation for this comparison, I would suggest to
> delete the part "unlike an OPTIONAL pattern".
> 
> 
> as described in Querying the Dataset
> -->
> as described in Section 8.3 "Querying the Dataset"
> 
> 
> Section 8
> 
> The example in section 8.2.3 uses GRAPH although GRAPH hasn't been
> explained yet, either remove this section, start section 8.3 before, I 
> think GRAPH should be introduced before giving an example using it.
> 
> <you may ignore this comment>
> BTW: Would be cool to have a feature creating a merge from named graphs
> as well...
> 
> ie. I can't have something like
> GRAPH g1
> GRAPH g2 { P }
> 
> where the merge of g1 and g2 is taken for evaluating P.
> whereas I can do this at the top level by several FROM clauses.
> (Note this is rather a wish-list comment than a problem with the current 
> spec, probably, might be difficult to define in combination with 
> variables...)
> </you may ignore this comment>
> 
> Section 8.2.3 makes more sense after the 8.3 examples, and 8.3.2 is
> simpler than 8.3.1, so, I'd suggest the order of subsections in 8.3
> 
> 8.3.2
> 
> 8.3.1
> 
> 8.3.3
> 
> 8.2.3
> 
> 8.3.4 (note that this example somewhat overlaps with what is shown in
> 8.2.3 already, but fine to have both, i guess.)
> 
> 
> 
> Section 9:
> 
> What is "reduced" good for? I personally would tend to make reduced the
> default, and instead put a modifier "STRICT" or "WITHDUPLICATES" which 
> enforces that ALL non-unique solutions are displayed.
> 
> "Offset: control where the solutions start from in the overall solution
> sequence."
> 
> maybe it would be nice to add: "[...] in the overall solution sequence, 
> i.e., offset takes precedence over DISTINCT and REDUCED"
> 
> at least, the formulation  "in the overall solution sequence" would
> suggest this... however, right afterwards you say:
> "modifiers are applied in the order given by the list above"... this
> seems somehow contradicting the "in the overall solution sequence", so
> then you should modify this to:
> "in the overall solution sequence, after application of solution
> modidiers with higher precedence" and give an explicit precedence to
> each solution modifier....
> 
> <you may ignore this comment>
> BTW: Why is precendence of solution modifiers not simply the oRder in
> which they are given in a query? wouldn't that be the simplest thing to do?
> 
> ie.
> 
> OFFSET 3
> DISTINCT
> 
> would be different than
> 
> DISTINCT
> OFFSET 3
> 
> depending on the order.
> Anyway, if you want to (which you probably do) stick with what you have
> now, it would at least be easier to read if you'd take the suggestion 
> with explicit precedence levels for each modifier.
> </you may ignore this comment>
> 
> 
> Section 9.1
> 
> The ORDER BY construct allows arbitrary constraints/expressions as 
> parameter...ie. you could give an arbitrary constraint condition here,
> right? What is the order of that? TRUE > FALSE? Would be good to add a 
> remark on that.
> 
>  I would put 'ASCENDING' and 'DESCENDING' in normal font, since it looks 
> like keaywords here, but since the respective keywords are ASC and DESC.
> 
> Stupid Question: What is the "codepoint representation"? ... Since more 
> people might be stupid, maybe a reference is in order.
> 
> 
> What is a "fixed, arbitrary order"??? Why not simply change
> 
> "SPARQL provides a fixed, arbitrary order"
> -->
> "SPARQL fixes an order"
> 
> and
> 
> "This arbitrary order"
> -->
> "This order"
> 
> I'd also move the sentence starting with "This order" after the 
> enumeration.
> 
> 
> Note that, in the grammar for OrderCondition I think you could write it 
> maybe shorter:
> 
> Wouldn't simply
>  orderCondition ::= ( 'ASC' | 'DESC' )? (Constraint | Var)
> do?
> 
>  In the paragrpah above the Grammar snippet, you forgot the ASK result 
> form where ORDER BY  also doesn't play a role, correct?
> 
> Sec 9.2:
> 
> Add somewhere in the prose: "using the SELECT result form"...
> 
> It is actually a bit weird that you mix select into the solution 
> modifiers, IMO, it would be better to mention SELECT first in section 9 
> and then introducing the solution modifiers.
> 
> Sec 9.3:
> 
> REDUCED also allows duplicates, or no? you mention before that reduced 
> only *permits* elimination of *some* duplicates... so, delete the "or 
> REDUCED" in the first sentence.
> 
> 
> Sec9.4:
> As for reduced as mentioned earlier, my personal feeling is that 
> REDUCED, or even DISTINCT should be the default, since it is less 
> committing, and I'd on the contrary put an alternative keyword "STRICT" 
> or "WITHDUPLICATES" which has the semantics that really ALL solutions 
> with ALL duplicates are given. My personal feeling is that
> aggregates, which you mention in the "Warning" box, anyway only make 
> sense in connection with DISTINCT. Or you should include a good example 
> where not...
> 
> Sec 9.5/9.6:
> 
> OFFSET 0 has no effect, LIMIT 0 obviously makes no sense since the 
> answer is always the empty solution set... So why for both not simply 
> only allowing positive integers? I see no benefit in allowing 0 at all.
> 
> Section 10:
> 
> "query form" or "result form"? I'd suggest to use one of both consistently
> and not switch.  Personally, I'd prefer "result form"...
> 
> Section 10.1
> 
> As for the overall structure, it might make sense to have the whole 
> section 10 before 9, since modifiers are anyway only important for 
> SELECT, and then you could skip the part on projection in section 9, as 
> SELECT is anyway not a solution modifier but a result form...
> You should call it also "projection" in section 10.1, ie. what I suggest 
> is basically merging section 10.1 and 9.2.
> 
> 
> Section 10.2
> 
> CONSTRUCT combines triples "by set union"?
> So, I need to eliminate duplicate triples if I want to implement
> CONSTRUCT in my SPARQL engine?
> Is this really what you wanted? In case of doubt, I'd suggest to
> remove "by set union", or respectively, analogously to SELECT,
> introduce a DISTINCT (or alternatively a WITHDUPLICATES)
> modifier for CONSTRUCT...
> 
> BTW, I miss the semantics for CONSTRUCT given formally in Section 12.
> 
> 
> Section 10.2.1
> 
> <you may ignore this comment>
> What if I want a single blank node connecting all solutions? That would 
> be possible, if I could nest constructs in the FROM part...
> </you may ignore this comment>
> 
> 
> Section 10.2.3
> 
>  Hmm, now you use order by, whereas you state before in Section 9.1 that 
> ORDER BY has no effect on CONSTRUCT... ah, I see, in combination with 
> LIMIT!
>  So, would it make sense in order to emphasize what you mean,  to change 
> in section
> 9.1
> 
> "Used in combination"
> -->
> "However, note that used in combination"
> 
> 10.3/10.4
> 
> I think that ASK should be mentioned before the informative DESCRIBE, 
> thus I suggest to swap these two sections.
> 
> Section 11
> 
> - Any changes in the FILTER handling from the last version? Is there a 
> changelog?
> - As mentioned earlier, I am a bit puzzled about the "evaluation" of 
> Constraints given as an argument to ORDER BY especially since there you 
> don't want to take the EBV but the actual value to order the solutions.
> (Note that what it means that a solution sequence "satisfies an order 
> condition" is also not really formally defined in Section 12!)
> 
> Apart from that, did not check the section in all detail again since it 
> seems to be similar to the prev. version , but some comments still:
> 
> "equivilence"?
> Do you mean equivalence? My dictionary doesn't know that word.
> 
> The codepoint reference should already be given earlier, as mentioned 
> above.
> 
> 
> Section 11.3.1
> 
> The operator extensibility  makes me a bit worried as for the 
> nonmonotonic behavior of  '! bound':
>  In combination with '! bound', does it still hold that
> "SPARQL extensions" will produce at least the same solutions as an 
> unextended implementations and may for some queries, produce more 
> solutions... I have an unease feeling here, though not substantiated by 
> proof/counterexample.
> 
> 
> Section 12 :
> 
> 12.1.1
> 
> 
> Is the necessity that the u_i's are distinct in the dataset really 
> important?
> Why not also define the data corresponding to the respective URI as 
> graph merge then, like the default graph?
> 
> 
> 12.2
> 
> The two tables suggests there is a corellation between the patterns and 
> modifiers appearing in the same line of the table, which is not the case.
> 
> Also, why in the first table is RDF Terms and triple patterns in one 
> line and not separate?
> 
> Why do you write
>    FILTER(Expression)
> but not
>   ORDER BY (Expression)
> as the syntax suggests?
> 
> Moreover, the tables should be numbered.
> 
> You use the abbreviation BGP for Basic graph pattern first in the second 
> table which wasn't introduced. Actually, it would be more intuitive, if 
> you'd use actually *symbols* for your algebra, like e.g. the ones from 
> traditional Relational Algebra, as was done in [Perez et al. 2006].
> 
> "The result of converting such an abstract syntax tree is a SPARQL query 
> that uses these symbols in the SPARQL algebra:"
> -->
> "The result of converting such an abstract syntax tree is a SPARQL query 
> that uses the following  symbols in the SPARQL algebra:"
> or maybe even better:
> "The result of converting such an abstract syntax tree is a SPARQL query 
> that uses the symbols introduced in Table 2 in the SPARQL algebra:"
> 
> What is "ToList"?
> 
> 12.2.1
> 
> The steps here  refer to the grammar?
> The steps obviously take the parse tree nodes of the grammar as the 
> basis...
> anyway this is neither explained nor entirely clear.
> 
> then connected with 'UNION'
> -->
> connected with 'UNION'
> 
> What do you mean by
> 
> "We introduce the following symbols:"
> 
> 1) what you define here is not 'symbols'
> 2) This doesn't seem to be a proper definition but just a bullet
>   list without further explanation.
> 
> as said before, the symbols, should indeed be symbols and be defined 
> properly in section 12.2 with the tables, in my opinion.
> 
> The algorithm for the transformation is a bit confusing, IMO. It seems 
> to be pseudo-code for a recursive algorithm, but it is not clear where 
> there are recursive calls.
> 
> Is the observation correct that in this algebra (following the algorithm)
> 
>     A OPTIONAL {B FILTER F}
> 
> would be the same as
> 
>    A  FILTER F OPTIONAL {B}
> 
> ???
> 
> ie, both result in:
> 
>  LeftJoin(A,B,F)
> 
> That is not necessarily intuitive in my opinion.
> Take the concrete exampe from above:
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n .
>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
> 
> As I said, in my understanding, this query could be used to supress
> email addresses for a particular name, whereas the algorithm suggests
> that this is just the same as writing:
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n . FILTER (?n != "John Doe")
>   OPTIONAL { ?x foaf:mbox ?m  }  }
> 
> Is this intended? If yes, the last example of section 12.2.2 is wrong.
> 
> BTW: If so, it seems that the whole second part of the algorithm can be 
> simplified to:
> -- 
> If F is not empty:
>   If G = LeftJoin(A1, A2, true) then
>         G := LeftJoin(A1, A2, F)
>   Else
>         G := Filter(F, G)
> -- 
> where, as I said, the first branch puzzles me a bit... and actually,
> it seems to be contradicted by the last example in seciton 12.2.2!
> 
> 
> 12.2.3
> 
> Why do you need ToList?
> 
> 
> Projection: You only mention SELECT here.. shouldn't you write here
> 
> "If the query is a SELCECT query"
> ??
> 
> "length defaults to (size(M)-start)."
> 
> "size(M)" isn't defined anywhere.
> 
> It would be probably more elegant to interpret 0 as parameter for LIMIT
> as ALL, since you can't know the size of the solution set upfront ... As 
> you mention above 'LIMIT 0' doesn't really make sense anyway.
> 
> In the definition of compatible mappings, you might want to change
> 
> "every variable v in dom(&mu;1) and in dom(&mu;2)"
> to
> "every variable v &isin;  dom(&mu;1) &cap; dom(&mu;2)"
> 
> "Write merge(&mu;1, &mu;2) for &mu;1 set-union &mu;2"
> 
> Why not use the symbol &cup; here?
> 
> 
> 12.3.1
> 
> "A Pattern Instance Mapping, P, is the combination of an RDF instance 
> mapping and solution mapping. P(x) = &mu;(&sigma;(x))"
> 
> Should this be:
> 
> "A Pattern Instance Mapping, P, is the combination of an RDF instance 
> mapping &mu; and solution mapping &sigma;: P(x) = &mu;(&sigma;(x))"
> 
> What is x here? I assume you want P to be defined as a mapping from
> RDF-T cup V to  RDF-T?
> &sigma; (instance mappings) are defined for graphs, not for variables!
> Something seems strange to me here.
> 
> 
> 12.3.2
> 
> You use the terms answer and answer set several times in that section 
> which haven't been defined... You should either do so, or refer to 
> solution, solution set, as defined.
> 
> 12.4
> 
> Filter:
> "a effective boolean"
> ->
> "an effective boolean"
> 
> 
> 
> Move the explaining sentence:
> 
> "It is possible that a solution mapping μ in a Join can arise in 
> different solution mappings, μ1and μ2 in the multisets being joined. 
> The cardinality of  μ is the sum of the cardinalities from all 
> possibilities."
> 
> before the definition of Join
> 
> 
> Note: the semantics of OrderBy seems to suggest, that any 
> (non-deterministically chosen?) sequence which satisfies the order 
> condition, is valid... correct?
> 
> 
> Definition of Project:
> - What is i in [i]???
> - The use of V is ambiguous here, since in the initial defs this was the 
> set of all possible SPARQL query variables.
> - The use of P is ambiguous here ,since P was used before to define a 
> pattern instance mapping in Sec 12.3.1 ... BTW: it would help a lot if 
> Definitions were numbered!
> 
> 
> 
> "The order of Distinct(Ψ) must preserve any ordering given by OrderBy."
> 
> hmmm,
> you mean:  "The order of Distinct(Ψ) must preserve any ordering given 
> by any nested OrderBy."?
> That is a bit weird, since the order by's have been resolved previously, 
> right?
> 
> I think the problem is with this notation:
> 
> "Write [x | C] for a sequence of elements where C(x) is true."
> 
> because this imposes a condition on the element and not on the whole 
> sequence.
> 
> 
> 12.5
> 
> The operator List(P) is nowhere defined.
> I still don't have totally clear why you need to introduce the ToList 
> operator.
> 
> 
> A general comment:
> 
> I miss a section defining the *Semantics of a query* and of different 
> result forms.
> The Evaluation semantics given here rather is a mix of functions having 
> partly multisets of solution mappings and sequences thereof as result, 
> but all are called "eval()".
>  E.g. eval for BGP returns a multiset, whereas eval returns a list for 
> ToList, etc.
> 
> The semantics of a *query* is not really clearly defined yet, it seems.
> This needs another revision, I guess.
> 
> 12.6
> 
> In this section again, the terms answer set and answers are used for 
> solutions.
> As mentioned above, I guess this needs to be introduced to be clear.
> 
> In the "Notes", item (d):
> 
> "the current state of the art in OWL-DL querying focusses on the case 
> where answer bindings to blank nodes are prohibited."
> 
> It would be helpful to give references here.
> 
> 
> "The same blank node label may not be used in two separate basic graph 
> patterns with a single query."
> 
> Isn't this restricting? I see no good motivation for this restriction, 
> to be honest.
> Anyway, you can remark that variables shall be used instead, where one 
> would feel that
> such overlapping blank nodes would be necessary, right?
> 
> 
> 
> 


-- 
Dr. Axel Polleres
email: axel@polleres.net  url: http://www.polleres.net/
Received on Monday, 16 April 2007 19:24:26 UTC