Re: [OK?] Re: Last Call for comments on "SPARQL Query Language for RDF"

Where asked back, replies inline.


Lee Feigenbaum wrote:
> Axel Poleres wrote on 04/17/2007 12:53:55 PM:
> 
>>Dear all,
>>
>>below my review on the current SPARQL draft from
>>
>>http://www.w3.org/TR/rdf-sparql-query/

>>
>>on behalf of W3C member organization DERI Galway.
>>
>>Generally, I think the formal definitions have improved a lot, but
>>still I am at the same time not 100% sure that all definitions are 
> 
> formally
> 
>>water-proof. This affects mainly questions on Section 12 and partly
>>unclear Definitions/pseudocode algorithms for query evaluation
>>therein. 
>>
>>HTH,
>>best,
>>Axel
> 
> 
> Hi Axel,
> 
> Thanks very much for your comprehensive review. The Working Group has done 
> its best to address all of your comments inline below. Please let us know 
> if you are satisfied with the responses to your comments. If you are, you 
> can help our comment tracking by replying to this message and adding 
> [CLOSED] to the subject. In the interests of our schedule, we will also 
> consider the comment closed if we have not heard back from you within 10 
> days.
> 
> thanks,
> Lee
> 
> PS: Please note that the responses below are written by several 
> individuals, and so the pronoun "I" does not always have the same 
> antecedent. Our apologies for any confusion this causes.
> 
> 
>>Detailed comments:
>>
>>
>>Prefix notation is still not aligned with Turtle. Why?
>>Would it make sense to align with turtle and use/allow '@prefix'
>>instead/additionally to 'PREFIX'
>>You also have two ways of writing variables... so, why not?
> 
> 
> The Working Group has tracked the punctuation within SPARQL as an issue 
> for a while: 
>   http://www.w3.org/2001/sw/DataAccess/issues#punctuationSyntax

> 
> The working group initially adopted RDQL as the model syntax for SPARQL: 
>   http://www.w3.org/2001/sw/DataAccess/ftf4.html#item18

>  
> This included PREFIX. The working group later adopted Turtle+variables as 
> the syntax for basic graph patterns in March, 2005: 
>   http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JanMar/0287.html

> 
> This particular aspect of the syntax has been stable for over two years, 
> and we have a good number of implementations and test cases for the 
> current syntax. At this point in the process, I don't see the new 
> information here that I'd need to ask the WG to reconsider this issue. 
> 
> The two syntaxes for variables arises from the history of ? variables in 
> RDF query languages and from the concerns raised here: 
>   http://lists.w3.org/Archives/Public/public-rdf-dawg/2004OctDec/0160 .
> 
> Note that there is an unresolved objection to the existence of both 
> variable syntaxes:
>  
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Mar/0006.html

> 
> 
> 
>>Section 4.1.1
>>
>>The single quote seems to be missing after the table in sec 4.1.1 in
>>"", or is this '"'?
> 
> 
> Corrected: single quote shouldn't be in the excluded characters as per 
> RFC3987. 
> 
> 
>>Section 4.1.4
>>
>>The form
>>
>>[ :p "v" ] .
>>
>>looks very awkward to me!
> 
> 
> Again, this syntax derives from the decision to adopt Turtle+variables for 
> the basic graph pattern syntax in SPARQL. 
> 
> 
>>I don't find the grammar snippet for ANON very helpful here, without
>>explanation what WS is...  shouldn't that be a PropertyListNotEmpty
>>instead? 
> 
> 
> As per convention, WS is whitespace (and is linked to its definition in 
> the grammar appendix); we've added the BlankNodePropertyList production 
> rule to the grammar summary here, as that rule is responsible for 
> constructs such as [ :p "v" ] . 
> 
> Note that [] is not just a propety list with no properties because a 
> non-empty property list can appear where [] can't. Example:
> 
> Legal:      [ :p "v"] . 
> Illegal:    [] .
> 
> This is why ANON is picked out an explicit token.
> 
> Changes made:
> 
> 1.83: added
>   [39] BlankNodePropertyList ::= '[' PropertyListNotEmpty ']'
> 
> 
> 
>>Section 5
>>
>>Section 5 is called Graph patterns and has only subsections
>>5.1 and 5.2 for basic and group patterns, whereas the other types are
>>devoted separate top level sections.. this structuring seems a bit
>>unlogical.
> 
> 
> In the interest of keeping the document numbering as is, we've decided to 
> keep this section as is. If you have a better suggestion for the name of 
> the section, we'd be glad to hear it. ("Basic Graph Patterns and Group 
> Graph Patterns" does not seem particularly helpful to a reader.)


I'd suggest to have two separate top level sections.
"In the interest of keeping the document numbering as is"
is not an argument which seems very logical to me, to be honest.

>>Why the restriction that a blank node label can only be used in a
>>single basic graph pattern? And if so, isn't the remark that the scope
>>is the enclosing basic graph pattern redundant?
> 
> The Working Group made this decision in our 30 Jan 2007 teleconference:
> http://www.w3.org/2007/01/30-dawg-minutes#item04

> 
> Some explanation for the design decision can be seen in these threads
>   from the WG list:
>  
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0055.html

> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0061.html

> 
> The second sentence reinforces the point but also says the label can be 
> used across different queries.  Left as-is.
> 
> 
> 
>>Why here the section about "extending basic graph pattern matching",
>>when not even basic graph pattern matching has been properly
>>introduced yet? If you want to only informally introduce about what
>>matching you talk here, then I'd call section 5.1.2 simply "Basic
>>Graph Pattern Matching" but I think I'd rather suggest to drop this
>>section. 
> 
> 
> Basic Graph Pattern matching and Groups are both ways to combine patterns 
> conjucntively but they are different when extending to more complex 
> entailment regimes.  Putting them in the same section brings the 
> discussion together.
> 
> Added to sec 5 intro:
> """
> In this section we describe the two forms that combine patterns by 
> conjunction, basic graph patterns, which combine triples patterns and 
> group graph patterns, which combine all other graph patterns.
> """
> 
> 
> 
>>"with one solution requiring no bindings for variables" -->
>>rather:
>>"with one solution producing no bindings for variables"
>>or:
>>"with one solution that does not bind any variables"
> 
> 
> Done (via your second suggestion).
> 
> 
>>Section 5.2.3
>>
>>Why you have a separate subsection examples here? It seems
>>superfluous/repetitive. Just put the last example, which seems to be
>>the
> 
> 
>>only new one, inside Sec 5.2.1 where it seems to fit, and drop the two
>>redundant ones. For the first one, you could add "and thatbasic
>>pattern consists of two triple patterns" to the first example in sec
>>5.2, for the second one, add the remark that "the FILTER does notbreak
>>the basic graph pattern into two basic graph patterns" to the
>>respective exaple in section 5.2.2.
> 
> 
> When extending SPARQL to more complex entailment regimes, the difference 
> between groups and basic graph patterns becomes more significant.  These 
> examples stress the distinction and are based on other feedback the group 
> has received in the past.
> 
> 
>>Section 6:
>>
>>One overall question which I didn't sort out completely so far:
>>What if I mix OPTIONAL with FILTERs?
>>
>>ie.
>>
>>{A OPTIONAL B FILTER F OPTIONAL C}
>>
>>is that:
>>
>>{{A OPTIONAL B} FILTER F OPTIONAL C}
>>
>>or rather
>>
>>{{A OPTIONAL B FILTER F} OPTIONAL C}
>>
>>and: would it make a difference? I assume no, the filter is, in both
>>cases at the level of A, but I am not 100% sure. Maybe such an example
>>owuld be nice to have...
> 
> 
> I'm not sure exactly what you're asking. Note that curly braces are
> required around the optional part of an OPTIONAL construct, so
> consider: 
> 
> { A OPTIONAL { B } FILTER F OPTIONAL {C} }

I meant C to be a group graph pattern, yes.

> (where A, B, and C are triple patterns and F is an expression). As per
> 5.2.2 http://www.w3.org/TR/rdf-sparql-query/#scopeFilters, filters
> constrain the solutions over the group (delimited by curly braces) in
> which they appear. In the algebra, the above example becomes: 
> 
> Filter(F,
>   LeftJoin(
>     LeftJoin(A, B, true),
>     C,
>     true
>   )
> )
 >
> This is similar to the examples in 12.2.2
> http://www.w3.org/TR/rdf-sparql-query/#sparqlAbsExamples . 
> 
>>Another one about FILTERs: What about this one, ie. a FILTER which
>>refers to the outside scope:
>>
>>?x p o OPTIONAL { FILTER (?x != s) }
>>
>>concrete example:
>>
>>SELECT ?n ?m
>>{ ?x a foaf:Person .  ?x foaf:name ?n .
>>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
>>
>>Supresses the email address for John Doe in the output!
>>Note: This one is interesting, since the OPTIONAL part may NOT be
>>evaluated separately!, but carries over a binding from the
>>super-pattern! 
> 
> 
> A filter in the optional part of an OPTIONAL construct applies to the
> solutions from the required part as (possibly) extended by the optional
> part. In the algebra, the example above becomes:
> 
> LeftJoin(
>   BGP(?x a foaf:Person .  ?x foaf:name ?n),
>   BGP(?x foaf:mbox ?m),
>   (?n != "John Doe")
> )

Hmmm, does this mean, that the query would simple be the same as writing

SELECT ?n ?m
{ ?x a foaf:Person .  ?x foaf:name ?n . FILTER (?n != "John Doe")
   OPTIONAL { ?x foaf:mbox ?m }  }

in this case?

(How) would it be possible then to encode my intended meaning of the 
query, ie. that I want to give all names, but supress the email address 
of John Doe?


> You can also see the evaluation semantics at:
>   http://www.w3.org/TR/rdf-sparql-query/#defn_algLeftJoin

>  
> 
> 
>>Do you have such an example in the testsuite? It seem that the last
>>example in Seciton 12.2.2 goes in this direction, more on that later
> 
> 
> I have made a note to add a test along these lines to the test suites. 
> Thanks for the suggestion.
> 
> 
>>Would it make sense to add some non-well-defined OPTIONAL patterns,
>>following [Perez et al. 2006] in the document? As mentioned before, I
>>didn't yet check section 12, maybe these corner case examples are
>>there.. 
> 
> The WG believes that the algebra in the current specification precisely 
> defines the answers of the OPTIONAL constructs refered to as "not 
> well-defined" in the Perez et al. paper. 

Well, you define the semantics, fine, but still it is interesting, since 
allowing or not allowing non-well-defined patterns makes complexitywise 
a huge difference for implementations.

 > We're not motivated to add these
 > examples to the document.

Why? I would object here, but not being part of the WG, I have to leave 
this decision to you of course.

>>Section 7:
>>
>>Why "unlike an OPTIONAL pattern"? This is comparing apples with
>>pears... 
>>I don't see the motivation for this comparison, I would suggest to
>>delete the part "unlike an OPTIONAL pattern".
> 
> 
> Deleted.
> 
> 
>>as described in Querying the Dataset
>>-->
>>as described in Section 8.3 "Querying the Dataset"
> 
> 
> Done.
> 
> 
>>Section 8
>>
>>The example in section 8.2.3 uses GRAPH although GRAPH hasn't been
>>explained yet, either remove this section, start section 8.3 before, I
>>think GRAPH should be introduced before giving an example using it.
> 
> We have to introduce GRAPH and FROM/FROM NAMED in some order. The example 
> could be a partial example but I think it is clearer to give a complete 
> query.

see my comment below, I made a suggestion for a more logical reordering...

> We have added
> """
> The GRAPH keyword is described below.
> """
> to the text immediately after the example.
> 
> 
> 
>><you may ignore this comment>
>>BTW: Would be cool to have a feature creating a merge from named
>>graphs 
>>as well...
>>
>>ie. I can't have something like
>>GRAPH g1
>>GRAPH g2 { P }
>>
>>where the merge of g1 and g2 is taken for evaluating P.
>>whereas I can do this at the top level by several FROM clauses.
>>(Note this is rather a wish-list comment than a problem with the
>>current 
> 
> 
>>spec, probably, might be difficult to define in combination with
>>variables...) </you may ignore this comment>
> 
> 
> Duly noted.

:-)

>>Section 8.2.3 makes more sense after the 8.3 examples, and 8.3.2 is
>>simpler than 8.3.1, so, I'd suggest the order of subsections in 8.3
>>
>>8.3.2
>>
>>8.3.1
>>
>>8.3.3
>>
>>8.2.3
>>
>>8.3.4 (note that this example somewhat overlaps with what is shown in
>>8.2.3 already, but fine to have both, i guess.)
> 
> 
> 
> As noted above, the concepts of GRAPH and FROM/FROM NAMED interact.  The 
> current organisation separates the two out and describes the dataset 
> first.  Other feedback on older versions of the document indicate this 
> works as well as other organisations in the main; there is a noticable 
> amount of individual taste.

true, matter of taste maybe, I think I can live with the comment inserted.


>>Section 9:
>>
>>What is "reduced" good for? I personally would tend to make reduced
>>the default, and instead put a modifier "STRICT" or "WITHDUPLICATES"
>>which enforces that ALL non-unique solutions are displayed.
> 
> REDUCED can be used to permit certain optimizations by the SPARQL query 
> engine. The WG discussed various design options in this space including 
> the design you are suggesting, and decided to add the REDUCED keyword and 
> mark the feature at-risk. More information:
> 
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/att-0194/20-dawg-minutes.html#item02

> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0162.html

> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JanMar/0128.html

> 
> ...and surrounding.

By making REDUCED the exception and all tuples with duplicates the 
default, you somewhat implicitly single out implementations which use 
per-set rather than per-tuple strategy with that, in my opinion. I find 
this limiting.

>>"Offset: control where the solutions start from in the overall
>>solution sequence." 
>>
>>maybe it would be nice to add: "[...] in the overall solution
>>sequence, i.e., offset takes precedence over DISTINCT and REDUCED"
> 
> The current text says: (sec 9):
> 
> """
> Modifiers are applied in the order given by the list above.
> """
> 
> which gives this relationship.

see next comment...

>>at least, the formulation  "in the overall solution sequence" would
>>suggest this... however, right afterwards you say:
>>"modifiers are applied in the order given by the list above"... this
>>seems somehow contradicting the "in the overall solution sequence", so
>>then you should modify this to:
>>"in the overall solution sequence, after application of solution
>>modidiers with higher precedence" and give an explicit precedence to
>>each solution modifier....
> 
> 
> Deleted
> 
> "in the overall sequence of solutions"

makes sense.

> 
>><you may ignore this comment>
>>BTW: Why is precendence of solution modifiers not simply the oRder in
>>which they are given in a query? wouldn't that be the simplest thing
>>to do? 
>>
>>ie.
>>
>>OFFSET 3
>>DISTINCT
>>
>>would be different than
>>
>>DISTINCT
>>OFFSET 3
>>
>>depending on the order.
>>Anyway, if you want to (which you probably do) stick with what you
>>have now, it would at least be easier to read if you'd take the
>>suggestion 
>>with explicit precedence levels for each modifier.
>></you may ignore this comment>
> 
> 
> The grammar currently places DISTINCT and OFFSET/LIMIT/ORDER BY in 
> positions similar to where they appear in SQL queries. I don't see any of 
> our use cases or requirements at this point that would motivate the need 
> for this sort of customizable precedence.

point taken, let me know in case SPARQL opens a next round where such a 
use case may sneak in.

>>Section 9.1
>>
>>The ORDER BY construct allows arbitrary constraints/expressions as
>>parameter...ie. you could give an arbitrary constraint condition here,
>>right? What is the order of that? TRUE > FALSE? Would be good to add
>>a remark on that. 
> 
> 
> This is for generality because semantic web data is not as structured (and 
> typed) as a database.  It allows the query to proceed without an error 
> condition so it generates some defined outcome.
>
> SPARQL doesn't provide total ordering, but the example you asked about is 
> specified.
> 
> [[
> The "<" operator (see the Operator Mapping) defines the relative order of 
> pairs of numerics, simple literals, xsd:strings, xsd:booleans and 
> xsd:dateTimes.
> ]]
> 
> "<" operator in the Operator Mapping table has an entry for
>   A < B  xsd:boolean  xsd:boolean  op:boolean-less-than(A, B)
> op:boolean-less-than is defined in XPath Functions and Operators
>   http://www.w3.org/TR/xpath-functions/#func-boolean-less-than

> 
> [[
> Summary: Returns true if $arg1 is false and $arg2 is true. Otherwise, 
> returns false.
> ]]
> 
> I think the LC document specifies all the orderings intended by the DAWG, 
> but am certainly open to counter-example.

What I meant to say was that a short clarifying remark would keep 
readers from having to look through the separate spec.

> We have changed the text between the example and the paragraph starting 
> with "The ascending order of two solutions". See the editor's draft:
>   http://www.w3.org/2001/sw/DataAccess/rq23/rq25#modOrderBy

> or pick your way through the text below:
> [[
>    The "<" operator (see the Operator Mapping) defines the relative order 
> of pairs of numerics, simple literals, xsd:strings, xsd:booleans and 
> xsd:dateTimes. Pairs of IRIs are ordered by comparing them as simple 
> literals.
> 
>    SPARQL also fixes an order between some kinds of RDF terms that would 
> not otherwise be ordered:
> 
>     1. (Lowest) no value assigned to the variable or expression in this 
> solution.
>     2. Blank nodes
>     3. IRIs
>     4. RDF literals
>     5. A plain literal is lower than an RDF literal with type xsd:string 
> of the same lexical form.
> 
>    The relative order of literals with language tags or typed literals 
> with different types is undefined.
> 
>    This list of variable bindings is in ascending order:
> 
>                    RDF Term                               Reason
>                                              Unbound results sort 
> earliest.
>    _:z                                       Blank nodes follow unbound.
>    _:a                                       There is no relative ordering 
> of
>                                              blank nodes.
>    <http://script.example/Latin>             IRIs follow blank nodes.
>                                              The character in the 23rd
>    <http://script.example/?????????>         position, "K", has a unicode
>                                              codepoint 0x41A, which is 
> higher
>                                              than 0x4C ("L").
>                                              The character in the 23rd
>    <http://script.example/???>            position, "***", has a unicode
>                                              codepoint 0x65E5, which is
>                                              higher than 0x41A ("K").
>    "http://script.example/Latin"             Simple literals follow IRIs.
>    "http://script.example/Latin"^^xsd:string xsd:strings follow simple
>                                              literals.
> ]]
> 
> 
> 
>>  I would put 'ASCENDING' and 'DESCENDING' in normal font, since it
>>looks like keaywords here, but since the respective keywords are ASC
>>and DESC.
> 
> 
> Good point. We also lower-cased them.
> 1.85: done
> 
> 
>>Stupid Question: What is the "codepoint representation"? ... Since
>>more people might be stupid, maybe a reference is in order.
> 
> 
> Amusingly, Google's "lucky" results for "codepoint representation":
>  
> http://www.google.com/search?hl=en&q=codepoint+representation&btnI=Google+Search

> currently point to AndyS answering the same question":
>  http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2006Jul/0007

> which backs up your point.
> 
> 1.85 has:
> 
> [[
> Pairs of IRIs are ordered by comparing them as simple literals.
> ]]
> 
> I think the conversion from <http://langtest.example/???> to 
> "http://langtest.example/???" is self-evident, but feel free to
> propose wording.
> 
> 
>>
>>What is a "fixed, arbitrary order"??? Why not simply change
>>
>>"SPARQL provides a fixed, arbitrary order"
>>-->
>>"SPARQL fixes an order"
>>
>>and
>>
>>"This arbitrary order"
>>-->
>>"This order"
>>
>>I'd also move the sentence starting with "This order" after the
>>enumeration. 
>>
> 
> 
> 1.85: got rid of it entirely
> 
> 
>>Note that, in the grammar for OrderCondition I think you could write
>>it maybe shorter: 
>>
>>Wouldn't simply
>>  orderCondition ::= ( 'ASC' | 'DESC' )? (Constraint | Var)
>>do?
>>
>>  In the paragrpah above the Grammar snippet, you forgot the ASK
>>result form where ORDER BY  also doesn't play a role, correct?
> 
> 
> ASK doesn't permit a SolutionModifier. Adding ASK there could imply that 
> it was allowed and even had some effect (other than syntax error).

wouldn't that be worth a footnote then, maybe?

>>Sec 9.2:
>>
>>Add somewhere in the prose: "using the SELECT result form"...
>>
>>It is actually a bit weird that you mix select into the solution
>>modifiers, IMO, it would be better to mention SELECT first in section
>>9 
>>and then introducing the solution modifiers.
> 
> 
> 
> SELECT is both an indicator of the query result form and also contains the 
> projection.

yes, that's my point.


>>Sec 9.3:
>>
>>REDUCED also allows duplicates, or no? you mention before that reduced
>>only *permits* elimination of *some* duplicates... so, delete the "or
>>REDUCED" in the first sentence.
> 
> 
> Changed to:
> 
> """
> A solution sequence with no DISTINCT or REDUCED modifier will preserve 
> duplicate solutions.
> """
> 
> This has also been restructured into 9.3/9.3.1 DISTINCT/9.3.2 REDUCED .
> 
> 
> 
>>Sec9.4:
>>As for reduced as mentioned earlier, my personal feeling is that
>>REDUCED, or even DISTINCT should be the default, since it is less
>>committing, and I'd on the contrary put an alternative keyword
>>"STRICT" 
>>or "WITHDUPLICATES" which has the semantics that really ALL solutions
>>with ALL duplicates are given. My personal feeling is that
>>aggregates, which you mention in the "Warning" box, anyway only make
>>sense in connection with DISTINCT. Or you should include a good
>>example where not...
> 
> 
> Please see above for a response about REDUCED. The WG has postponed the 
> issue of aggregate functions:
> 
>   http://www.w3.org/2001/sw/DataAccess/issues#countAggregate .
> 
> 
>>Sec 9.5/9.6:
>>
>>OFFSET 0 has no effect, LIMIT 0 obviously makes no sense since the
>>answer is always the empty solution set... So why for both not simply
>>only allowing positive integers? I see no benefit in allowing 0 at
>>all. 
> 
> 
> The WG believes that allowing 0 eases the burden on programmatically 
> generated queries.

What is the justification for this belief if I may ask?


>>Section 10:
>>
>>"query form" or "result form"? I'd suggest to use one of both
>>consistently and not switch.  Personally, I'd prefer "result form"...
> 
> 
> Consistently used "query form" (it was an earler working group discussion 
> to use "query form" but we didn't roll that out properly).
> 
> 
> 
>>Section 10.1
>>
>>As for the overall structure, it might make sense to have the whole
>>section 10 before 9, since modifiers are anyway only important for
>>SELECT, and then you could skip the part on projection in section 9,
>>as SELECT is anyway not a solution modifier but a result form...
>>You should call it also "projection" in section 10.1, ie. what I
>>suggest is basically merging section 10.1 and 9.2.
> 
> 
> The current structure builds up from pattern matching through sequence 
> modifiation to the query forms.  This minimises the forward references.
> 
> 
> 
>>Section 10.2
>>
>>CONSTRUCT combines triples "by set union"?
>>So, I need to eliminate duplicate triples if I want to implement
>>CONSTRUCT in my SPARQL engine?
>>Is this really what you wanted? In case of doubt, I'd suggest to
>>remove "by set union", or respectively, analogously to SELECT,
>>introduce a DISTINCT (or alternatively a WITHDUPLICATES)
>>modifier for CONSTRUCT...
> 
> 
> A set represented with duplicate triples is identical to a representation 
> without any duplicates, 

no, it is not identical if viewed as dataset for another query: if I 
apply another (SELECT) query on the output of the CONSTRUCT  - which 
again is RDF, so why not? - then ther is potentially a difference (see 
the distinct/reduced issue)

> so I believe the text is correct as written.  That 
> is, the following are representations of the same graph:
> 
> <x> <y> <z> .
> 
> and
> 
> <x> <y> <z> .
> <x> <y> <z> .

if I ask aquery with solution modifiers on these two graphs, then it is 
not the same! Attention!


>>BTW, I miss the semantics for CONSTRUCT given formally in Section 12.
> 
> 
> We do not right now intend to include CONSTRUCT in Section 12. CONSTRUCT 
> is defined normatively in section 10.2. ( 
> http://www.w3.org/TR/rdf-sparql-query/#construct ).

I fail to find a definition of the formal semantics of CONSTRUCT there.

CONSTRUCT is likely one of the things which people will pick up very 
fast...so it would be good to have this more formal, I think.

>>Section 10.2.1
>>
>><you may ignore this comment>
>>What if I want a single blank node connecting all solutions? That
>>would 
>>be possible, if I could nest constructs in the FROM part...
>></you may ignore this comment>
> 
> 
> The working group has not identified a use case or requirement for this, 
> and it is not supported by the current SPARQL draft. 

yes, that's why I ask :-)

> (As an aside, you 
> can, to some extent and for implementations that resolve graph URIs, nest 
> queries in the way you mention by supplying to FROM the HTTP GET URL for a 
> nested query as per the SPARQL protocol.)

Interesting aspect! Thanks for pointing that one out!

>>Section 10.2.3
>>
>>  Hmm, now you use order by, whereas you state before in Section 9.1
>>that ORDER BY has no effect on CONSTRUCT... ah, I see, in combination
>>with LIMIT!
>>  So, would it make sense in order to emphasize what you mean,  to
>>change in section
>>9.1
>>
>>"Used in combination"
>>-->
>>"However, note that used in combination"
> 
> 
> Done.
> 
> 
>>10.3/10.4
>>
>>I think that ASK should be mentioned before the informative DESCRIBE,
>>thus I suggest to swap these two sections.
> 
> 
> Sections 10.3/10.4 swapped.
> 
> 
> 
>>Section 11
>>
>>- Any changes in the FILTER handling from the last version? Is there
>>a changelog? 
> 
> 
> Comparing
> http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/#OperatorMapping

> http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/#OperatorMapping

> I see:
> 
> The definition for
>   DATATYPE(A)            RDF term                datatype(A) IRI
> has been split into
>   DATATYPE(A)            typed literal           datatype(A) IRI
>   DATATYPE(A)            simple literal          datatype(A) IRI
> to move the type checking into the table.
> 
> Support for xsd:booleans was added.
> 
> The functionality of
> 
>   A = B  RDF term  RDF term  RDFterm-equal(A, B)  xsd:boolean
>  
> http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/#func-RDFterm-equal

> 
> was moved to a set of functions:
> 
>   A = B        simpleLiteral  simpleLiteral op:numeric-equal(fn:compare(A, 
> B), 0)  xsd:boolean
>   A = B        string    string op:numeric-equal(fn:compare(STR(A), 
> STR(B)), 0)  xsd:boolean
>   A = B                 xsd:boolean  xsd:boolean   op:boolean-equal(A, B) 
> xsd:boolean
>   sameTERM(A)  RDF term  RDF term  sameTerm(A, B) xsd:boolean
> 
> 
>>- As mentioned earlier, I am a bit puzzled about the "evaluation" of
>>Constraints given as an argument to ORDER BY especially since there
>>you don't want to take the EBV but the actual value to order the
>>solutions. 
> 
> 
> The text states that one evaluates the expressions, but not as FILTERs. I 
> don't think it implies that one should take the EBV of the expression.

arguable...

>>(Note that what it means that a solution sequence
>>"satisfies an order condition" is also not really formally defined in
>>Section 12!) 
>>
>>Apart from that, did not check the section in all detail again since
>>it seems to be similar to the prev. version , but some comments still:
>>
>>"equivilence"?
>>Do you mean equivalence? My dictionary doesn't know that word.
> 
> 
> We've fixed this.
> 
> 
>>The codepoint reference should already be given earlier, as mentioned
>>above. 
> 
> 
> The earlier use of codepoint is gone. There is a reference to a specific 
> operator (simple literal < simple literal) and examples in its place.
> 
> 
>>Section 11.3.1
>>
>>The operator extensibility  makes me a bit worried as for the
>>nonmonotonic behavior of  '! bound':
>>  In combination with '! bound', does it still hold that
>>"SPARQL extensions" will produce at least the same solutions as an
>>unextended implementations and may for some queries, produce more
>>solutions... I have an unease feeling here, though not substantiated
>>by proof/counterexample. 
>>
> 
> 
> true. !bound is essentially negation as failure, and therefor makes closed 
> world assumptions. Are you suggesting some advisory text in the spec?

would need to think about this.

>>Section 12 :
>>
>>12.1.1
>>
>>
>>Is the necessity that the u_i's are distinct in the dataset really
>>important? Why not also define the data corresponding to the
>>respective URI as 
>>graph merge then, like the default graph?
> 
> 
> Yes - it's important.  Only FROM allows merging.  We would have to expand 
> and change FROM NAMED to allow merging - currently FROM NAMED declares a 
> URI for a graph in the dataset.  It does not provide a means to create 
> that graph.

One could use the nesting of subqueries over a GET URI as you suggested 
above.... as noted above, I find this an intersting aspect indeed.

>>12.2
>>
>>The two tables suggests there is a corellation between the patterns
>>and modifiers appearing in the same line of the table, which is not
>>the case. 
> 
> 
> Thanks for pointing this out. We intend to fix this, though haven't yet.
> 
> 
>>Also, why in the first table is RDF Terms and triple patterns in one
>>line and not separate?
>>
>>Why do you write
>>    FILTER(Expression)
>>but not
>>   ORDER BY (Expression)
>>as the syntax suggests?
> 
> 
> Removed "(expression)" in FILTER.
> 
> Similarly, in the following table, just name the solution modifier terms.
> 
> 
>>Moreover, the tables should be numbered.
>>
>>You use the abbreviation BGP for Basic graph pattern first in the
>>second 
>>table which wasn't introduced. Actually, it would be more intuitive,
>>if you'd use actually *symbols* for your algebra, like e.g. the ones
>>from traditional Relational Algebra, as was done in [Perez et al.
>>2006]. 
> 
> 
> Yes - BGP is used in  step 1 and is a symbol introduced into a SPARQL 
> query algebra expression.  All the symbols are charcater strings.  Using 
> characters (or bitmaps equivalents) in HTML is problematic because it 
> woudl rely on the font setup in the browers.   Few font families have the 
> glyphs for these symbols.  I brought this to the WG's attention when 
> drafting the section and they were aware of the choice of text over 
> unusual font characters. 
>  
> 
>>"The result of converting such an abstract syntax tree is a SPARQL
>>query that uses these symbols in the SPARQL algebra:"
>>-->
>>"The result of converting such an abstract syntax tree is a SPARQL
>>query 
> 
> 
>>that uses the following  symbols in the SPARQL algebra:"
>>or maybe even better:
>>"The result of converting such an abstract syntax tree is a SPARQL
>>query 
> 
> 
> Changed to the above suggestion.
> 
> 
> 
>>that uses the symbols introduced in Table 2 in the SPARQL algebra:"
>>
>>What is "ToList"?
> 
> 
> It's just a symbol and its evaluation semantics are defined in the 
> evaluation section.
> 
> It's necessary because a multiset is not a sequence so we need to convert 
> from multisets to sequences at the start of sequence modifiers.  To make 
> this clearer, I've added 
> 
> "ToList is used where conversion from the results of graph pattern 
> matching to sequences occurs."
> 
> 
>>12.2.1
>>
>>The steps here  refer to the grammar?
>>The steps obviously take the parse tree nodes of the grammar as the
>>basis... anyway this is neither explained nor entirely clear.
>>
>>then connected with 'UNION'
>>-->
>>connected with 'UNION'
>>
> 
> 
> Fixed.
> 
> 
>>What do you mean by
>>
>>"We introduce the following symbols:"
>>
>>1) what you define here is not 'symbols'
>>2) This doesn't seem to be a proper definition but just a bullet
>>   list without further explanation.
>>
>>as said before, the symbols, should indeed be symbols and be defined
>>properly in section 12.2 with the tables, in my opinion.
> 
> 
> See above.  We choose to use multicharcater symbols for the readability in 
> the widest possible circumstances.

slight tendency to object, this depends on how you define "readability".
In my definition of this word, readability would be improved by using 
symbols. but ok.

> An algebra expression is just a data structure of symbols, much like a 
> query string is just a sequence of characters.  The semantics comes from 
> evaluation.
> 
> 
>>The algorithm for the transformation is a bit confusing, IMO. It seems
>>to be pseudo-code for a recursive algorithm, but it is not clear
>>where there are recursive calls. 
> 
> 
> In step 4, it says:
> 
> """
> A group pattern is mapped into the SPARQL algebra as follows: first, 
> convert all elements making up the group into algebra expressions, then 
> apply the following transformation:
> """
> 
> I have changed this to:
> 
> """
> A group pattern is mapped into the SPARQL algebra as follows: first, 
> convert all elements making up the group into algebra expressions using 
> this transformation process recursively, then apply the following 
> transformation:
> """
> 
>  
> 
>>Is the observation correct that in this algebra (following the
>>algorithm) 
>>
>>     A OPTIONAL {B FILTER F}
>>
>>would be the same as
>>
>>    A  FILTER F OPTIONAL {B}
>>
>>???
>>
>>ie, both result in:
>>
>>  LeftJoin(A,B,F)
> 
> 
> The first is:
> 
>   LeftJoin(A,B,F)
> 
> and the second is
> 
> Filter(F, LeftJoin(A,b,true) )
> 
> which are not equivalent.
 >
>>That is not necessarily intuitive in my opinion.
>>Take the concrete exampe from above:
>>
>>SELECT ?n ?m
>>{ ?x a foaf:Person .  ?x foaf:name ?n .
>>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
>>
>>As I said, in my understanding, this query could be used to supress
>>email addresses for a particular name, whereas the algorithm suggests
>>that this is just the same as writing:
>>
>>SELECT ?n ?m
>>{ ?x a foaf:Person .  ?x foaf:name ?n . FILTER (?n != "John Doe")
>>   OPTIONAL { ?x foaf:mbox ?m  }  }
>>
>>Is this intended? If yes, the last example of section 12.2.2 is wrong.
> 
> 
>>BTW: If so, it seems that the whole second part of the algorithm can
>>be simplified to: --
>>If F is not empty:
>>   If G = LeftJoin(A1, A2, true) then
>>         G := LeftJoin(A1, A2, F)
>>   Else
>>         G := Filter(F, G)
>>--
>>where, as I said, the first branch puzzles me a bit... and actually,
>>it seems to be contradicted by the last example in seciton 12.2.2!
>>
> 
> 
> The example is right.
> 
> The example was added to highlight the fact they are different.
> 
> 
>>12.2.3
>>
>>Why do you need ToList?
> 
> 
> See above: it converts from multisets to sequences.
> 
> 
>>Projection: You only mention SELECT here.. shouldn't you write here
>>
>>"If the query is a SELCECT query"
>>??
>>
>>"length defaults to (size(M)-start)."
>>
>>"size(M)" isn't defined anywhere.
>>
>>It would be probably more elegant to interpret 0 as parameter for
>>LIMIT 
>>as ALL, since you can't know the size of the solution set upfront ...
>>As 
> 
> 
>>you mention above 'LIMIT 0' doesn't really make sense anyway.
>>
>>In the definition of compatible mappings, you might want to change
>>
>>"every variable v in dom(&mu;1) and in dom(&mu;2)"
>>to
>>"every variable v &isin;  dom(&mu;1) &cap; dom(&mu;2)"
>>
>>"Write merge(&mu;1, &mu;2) for &mu;1 set-union &mu;2"
>>
>>Why not use the symbol &cup; here?
> 
> 
> As noted above the reliance on some symbols being available is not safe 
> across enough brower and locale setups.  We are striking a balance here.

and &mu; is safe?

> 
>>12.3.1
>>
>>"A Pattern Instance Mapping, P, is the combination of an RDF instance
>>mapping and solution mapping. P(x) = &mu;(&sigma;(x))"
>>
>>Should this be:
>>
>>"A Pattern Instance Mapping, P, is the combination of an RDF instance
>>mapping &mu; and solution mapping &sigma;: P(x) = &mu;(&sigma;(x))"
> 
> 
> Done.
> 
> 
>>What is x here? I assume you want P to be defined as a mapping from
>>RDF-T cup V to  RDF-T?
>>&sigma; (instance mappings) are defined for graphs, not for variables!
>>Something seems strange to me here.
> 
> 
> P is a substitution function and it maps BGPs to BGPs, which if all the 
> variables are now bound, is tested as a subgraph in the next step. "x" is 
> a BGP - the function argument.  P is not fixed over all queries - it is a 
> per-solution substitution function.
> 
> We could write P = mu(sigma) (or other function composition approach) but 
> I feel that explicitly highlighing that P is a function taking an rgument 
> is beneficial.
> 
> 
>>12.3.2
>>
>>You use the terms answer and answer set several times in that section
>>which haven't been defined... You should either do so, or refer to
>>solution, solution set, as defined.
> 
> 
> Usage made consistent through out sec 12. We intend to make this editorial 
> fix throughout the document, though haven't yet.
> 
> 
> 
>>12.4
>>
>>Filter:
>>"a effective boolean"
>>->
>>"an effective boolean"
> 
> 
> Done.
> 
> 
>>
>>
>>Move the explaining sentence:
>>
>>"It is possible that a solution mapping µ in a Join can arise in
>>different solution mappings, µ1and µ2 in the multisets being joined.
>>The cardinality of  µ is the sum of the cardinalities from all
>>possibilities." 
>>
>>before the definition of Join
> 
> 
> It seems as valid after as before - discussing the detail before saying 
> what "it" is can be confusing.

matter of taste, probably... ;-)


>>Note: the semantics of OrderBy seems to suggest, that any
>>(non-deterministically chosen?) sequence which satisfies the order
>>condition, is valid... correct?
> 
> 
> Correct.
> 
> SELECT ?x ?y ... ORDER By ?x is only partially determined.
> 
> 
>>Definition of Project:
>>- What is i in [i]???
> 
> 
> Removed
> 
> 
>>- The use of V is ambiguous here, since in the initial defs this was
>>the 
> 
> 
> Changed to "PV"
> 
> 
> 
>>set of all possible SPARQL query variables.
>>- The use of P is ambiguous here ,since P was used before to define a
>>pattern instance mapping in Sec 12.3.1 ... BTW: it would help a lot
>>if Definitions were numbered! 
>>
> 
> 
> Changed "Proj" for "P"
> 
>  
> 
>>"The order of Distinct(?) must preserve any ordering given by
>>OrderBy." 
>>
>>hmmm,
>>you mean:  "The order of Distinct(?) must preserve any ordering given
>>by 
> 
> 
>>any nested OrderBy."?
>>That is a bit weird, since the order by's have been resolved
>>previously, 
> 
> 
> The point is that Distinct can't just randomly shuffle the sequence.
> 
> Same is true for Project, which I have added.
> 
> 
> 
>>right?
>>
>>I think the problem is with this notation:
>>
>>"Write [x | C] for a sequence of elements where C(x) is true."
>>
>>because this imposes a condition on the element and not on the whole
>>sequence. 
> 
> 
> Suggestions for a better notation are welcome. At several points, we need 
> to talk about the elements of the sequence, which this notation enables.
> 
> 
>>
>>12.5
>>
>>The operator List(P) is nowhere defined.
>>I still don't have totally clear why you need to introduce the ToList
>>operator. 
> 
> 
> Already discussed.

Also that "List(P)" is not defined?

> 
> 
> 
>>A general comment:
>>
>>I miss a section defining the *Semantics of a query* and of different
>>result forms. The Evaluation semantics given here rather is a mix of
>>functions having partly multisets of solution mappings and sequences
>>thereof as result, 
>>but all are called "eval()".
>>  E.g. eval for BGP returns a multiset, whereas eval returns a list
>>for ToList, etc. 
>>
>>The semantics of a *query* is not really clearly defined yet, it
>>seems. This needs another revision, I guess.

no response here?

>>12.6
>>
>>In this section again, the terms answer set and answers are used for
>>solutions. As mentioned above, I guess this needs to be introduced to
>>be clear. 
>>
>>In the "Notes", item (d):
>>
>>"the current state of the art in OWL-DL querying focusses on the case
>>where answer bindings to blank nodes are prohibited."
>>
>>It would be helpful to give references here.
> 
> 
> The notes highlight the working assumptions.  I don't think references 
> would change that.  This is a diference between an acedemic paper and a 
> specification.

You mean that a specification shouldn't follow general rules of style 
which make the reader more comfortable (such as for instance 
references)? disagree, to be honest.

>>"The same blank node label may not be used in two separate basic graph
>>patterns with a single query."
>>
>>Isn't this restricting? I see no good motivation for this restriction,
>>if the group patterns refer to the same graph, ie are no graph
>>patterns, 
> 
> 
>>to be honest. Anyway, you can remark that variables shall be used
>>instead, where one would feel that
>>such overlapping blank nodes would be necessary, right?
>>
> 
> 
> Blank nodes and variables do not behave the same for higher levels of 
> entailment (e.g. OWL-DL).  A blank node is an existenal variable and a 
> query using bnodes can generate more results than one that replaced them 
> with named variables where a binding is required.
> 
> Example:
>   http://lists.w3.org/Archives/Public/public-rdf-dawg/2004JulSep/0069.html

> 
> The restriction on blank node labels means that a solving BGPs with OWL-DL 
> entailment does nto require state to be carried over from one BGP to 
> another except via named variables which alreayd do have bound values.
> 
> 
>>Note that, in the context of bnodes however, I have a problem with
>>this one: 
>>
>>"(c) These conditions do not impose the SPARQL requirement that SG
>>share
>>no blank nodes with AG or BGP. In particular, it allows SG to actually
>>be AG. This allows query protocols in which blank node identifiers
>>retain their meaning between the query and the source document, or
>>across multiple queries. Such protocols are not supported by the
>>current 
> 
> 
>>SPARQL protocol specification, however."
>>
>>Note that this seems to be a bit worrying to me. It seems to suggest
>>that extensions of SPARQL allow to treat BNodes different to
>>existential 
> 
> 
>>variables, which is what would become possible if you allow them to
>>retain their meaning over stacked queries. I am a bit worried, if this
>>"backdoor" is really compatible with the intention of bnodes in RDF.
>>
> 
> This comment refers to the situation where an implementation wishes to 
> give greater control to some application (e.g. the data management 
> application - one that is close and associated with the deeper details of 
> the implementation) in the case where the SPARQL protocol is NOT being 
> used (e.g. a programming API).
> 
> Again, yhank you very much for your detailed comments.



-- 
Dr. Axel Polleres
email: axel@polleres.net  url: http://www.polleres.net/





Received on Thursday, 3 May 2007 23:55:56 UTC