Re: Review of Query document (Property paths, negation, select expression, testing values) from Andy Seaborne on 2010-09-24 (public-rdf-dawg@w3.org from July to September 2010)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 24 Sep 2010 17:33:21 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4C9CD2D1.2070008@epimorphics.com>
On 22/09/10 15:05, Andy Seaborne wrote:
>
>
> On 20/09/10 23:18, Gregory Williams wrote:
>> I've based the review on the sections Andy highlighted in a previous
>> email as having been changed. His sections are quoted below (with some
>> additional section headers added for the federation document).
>>
>> .greg
>
> Greg - thank you very much for the review and comments.
>
> To make it easier to track all the comments across different areas of
> prime responsibility, I've split the content up into:
>
> * Property paths, negation, select expression, testing values
> * Groups/aggregates & subquery
> * Basic federated query.
>
> Andy
>
>  >> > > Material different from SPARQL 1.0:
>  >> > >
>  >> > > 8 Negation
>  >> > > 8.1 Filtering Using Graph Patterns
>  >> > > 8.1.1 Testing For the Absence of a Pattern
>  >> > > 8.1.2 Testing For the Presence of a Pattern
>  >> > > 8.2 Removing bindings
>  >> > > 8.3 Relationship and difference between NOT EXISTS and MINUS
> I find the naming of 8.1 and 8.2 to be asymmetrical. 8.2 seems like it's
> more about removing query solutions, not variable bindings.

Yes - would this for for you:

"8.2 Removing Possible Solutions"

It's trying to stress the evaluate both sides and remove one from the 
other aspect.

Change made - can make another change if we come up with a better title.

> Section 8.2
> also says, "calculates solutions in one side that are not compatible
> with the other side" but this isn't exactly true.The text needs to
> explicitly call out that it's not just compatibility but also a
> non-empty domain intersection that is required for the result removal.

This is the first line overview of the section.  I don't know how else 
to express it - spelling the "compatible unless no overlap" condition 
here seems somewhat opaque. "consistent with" was the best I could come 
up with but "consistent" pulls in a lot of other meaning.

Left for now: any suggestions?

> The specifics of this are in 17.4. However, I think the wording in 17.4
> is misleading:
>
> """
> The additional restriction on dom(μ) and dom(μ') is added so that if any
> solution mapping has no variables in common with solution mappings of Ω1
> then Minus(Ω1, Ω2) is empty, regardless of the rest of Ω2.
> """
>
> But "then Minus(Ω1, Ω2) is empty" should really be "then Minus(Ω1, Ω2)
> is NOT empty," distinguishing it from the case that would result in
> Minus just determining compatability without concern for the domains of
> Ω1 and Ω2. Also, the cardinality definition for Minus seems to say that
> it's the same as the cardinality of the Minus operation's left-hand
> side, but that doesn't seem right.

I think the text as written is correct as the earlier part of the 
sentence reframes Minus as without the additional condition.  But it's 
confusing so what about:
[[
The additional restriction on dom(μ) and dom(μ') is added because otherwise
   if there is a solution mapping in Ω2 that has no variables in
   common with the solution mappings of Ω1, then
   Minus(Ω1, Ω2) would be empty, regardless of
   the rest of Ω2. The empty solution mapping is compatible
   with every other solution mapping so P MINUS {} would otherwise
   be empty for any pattern P.
]]

>
>  >> > > 9 Property Paths
> The first sentence of section 9 seems to be missing its ending period.

Done.

> Outside of the grammar rules interspersed throught the document, the
> word "property" is never used in relation to the predicate of a
> triple(pattern) before the property paths section. This may make the
> text confusing in discussing property paths as "similar to a string
> regular expression but over properties, not characters". Maybe the
> relation between properties and predicates is obvious enough to not need
> clarification, but I thought it worth noting.

Good point - I have seen "property" used in the sense of "property of an 
object" which really means triple.  "Predicate Paths" seems like a worse 
name though.  But there again the "regular expression" is matched on the 
graph. Sentence removed.

>
> There's a free-floating sentence fragment just before section 9.1: "any
> given path expression."

Removed.

>
>
>  >> > > 9.1 Property Path Expressions
> "uri is either a URI or a prefixed name" -- is the 'a' shorthand for
> rdf:type allowed?

Yes.

[[
In the description below, uri is either a URI, a
       prefixed name, or the keyword a, and elt is a path element, which 
may itself
       be composed of path syntax constructs.

]]
>
>
> I think "nor uri_{j+1}...^uri_n as reverse paths" is missing a ^ on
> uri_{j+1}.

as it is "as reverse paths", remove ^ from ^uri_n

>  >> > > 9.2 Examples
> "or, with explicit variables" might be more clear with text indicating
> that the extra variables make this not exactly the same query as the
> results from this variant will include bindings for the new named
> variables.
>
> "rdf;type" should be "rdf:type"

Done

>
>
>  >> > > These next sections contain material that will be moved into the
> formal definitions:
>  >> > >
>  >> > > 9.3 Algebra for Property Paths
>  >> > > 9.3.1 Translation of Property Paths
>  >> > > 9.3.2 Material for the "Initial Definitions" section
>  >> > > 9.3.2.1 Property Paths Patterns
> "A Property Path is a sequence of triples, ti in ST" -- what is "ST"?
>
> "We call the object of t_n the end of the path." -- should n here
> instead be length(ST)-1?
>
>  >> > > 9.3.3 Property Path Expression Matching
>  >> > > 9.3.3.1 SPARQL Property Path Pattern Matching
>  >> > >
>  >> > > 9.3.4 Material for the SPARQL syntax to SPARQL algebra section
>  >> > > 9.3.5 Material for the SPARQL algebra section
> "A zero length path matches subjects and all objects ..." Does this mean
> "all subjects and objects"? Not sure why the choice of "subjects" vs.
> "*all* objects".

Changed to "all subjects and all objects"

>
> "such that a intermediate nodes in the graph are trarversed once only"
> -- 'a' should be 'all'?

"any intermediate nodes"

>  >> > > 9.3.6 Material for the SPARQL Evaluation Semantics section
>  >> > > 9.3.6.1 Zero-length paths
> Is there a reason to include 'path' in ZeroLengthPath(X, path, Y) ?
>
> "{ { (y, x) } }" -- should be "{ { (vy, x) } }"

Done

>
>
> """
> eval(D(G), ZeroLengthPath(x:term, path, y:term)) =
> { {} }
> card[] = 1
> """
> -- should this return the empty set {} instead of the set of one empty
> result { {} } when x != y?

Yes - fixed.

>  >> > > 9.3.6.2 Arbitrary length paths
> "deteching" -- sp. "detecting".

Done.

>
> "ALP(x:term, path) = ALP(x:term, path), {})" -- has one too many
> right-parens in it. Presumably should be "ALP(x:term, path) =
> ALP(x:term, path, {})"

Done.

> "R = R + t + ALP(t, Path, Visted)" -- should 't' be '{t}'? (assuming +
> is understood as set-union?)

Done.

> "multiset-union({(vx,t} | | t in nodes(G) and y in
> ArbitraryLengthPath(t, path, y) })" -- syntax here is messed up, and I'm
> not sure about the second (recursive?) condition 'y in
> ArbitraryLengthPath(t,path,y). Maybe meant to be "t in nodes(G) and y in
> ALP(t,path)" ?

| | changed to |

It's defining a multiset in the usual fashion.  May look better when 
typeset in HTML.  Maybe.

Need to recurse still anchoring the path to end at y, hence
  ArbitraryLengthPath(t, path, y)

>  >> > > 15.1.2 SELECT expressions
> "Variables can be also be used in expressions if they are introduced as
> to the earlier, syntactically, in the same SELECT clause" -- the wording
> here needs to be fixed. Talking about "New variables" being available
> might be more clear than just "Variables".

Changed to:
[[
New variables can be also be used in expressions if they are introduced 
earlier, syntactically, in the same SELECT clause:
]]


>  >> > > 16.6 Extensible Value Testing
>  >> > >
>  >> > > New functions where there is a keyword:
>  >> > > 16.4.14 COALESCE
>  >> > > 16.4.15 IF
> "rdfTerm coalesce(expression*)" -- this syntax doesn't seem to include
> the (required?) commas to separate expressions (c.f. 16.4.16 IN).

Done.

> "The COALESCE function form" -- why not just "COALESCE function" without
> "form"? Similarly for IF.

Because, strictly, they aren't functions.  A function evaluates all its 
arguments, then calls the logic for the function.  COALESCE and IF delay 
the evaluation of the arguments until during the logic.

> "COALESCE(5, ?x) returns 2" -- shouldn't it return 5?

Done.

>
> "the returns the value of expr2" -- should be "*then* returns the value
> of expr2".

Done.

>  >> > > 16.4.16 IN
> "A list of zero terms on the right-hand side is legal." -- this is
> repeated twice in this section.

Done.

>
>  >> > > 16.4.17 NOT IN
> "A list of zero terms on the right-hand side is legal." -- this is
> repeated twice in this section.

Done.

>
>  >> > > 16.4.18 IRI
> "Passing any other RDF term is an error." -- this is misleading as both
> simple literals and iris are allowable, but this directly follows a
> sentence describing just the IRI(iri) case.

  <p>Passing any RDF term other than a simple literal or an IRI is an 
error.</p>

>
>  >> > > 16.4.19 URI
>  >> > > 16.4.20 BNODE
> "constructs a blank node that is distinct for all blank nodes" -- should
> that be "distinct *from* all blank nodes"?

Done.

>
> "distinct from all blank nodes created by calls to this constructor for
> other query solutions." -- "other query solutions" seems strange here
> since at the time BNODE() is evaluated it might not be operating with an
> actual query solution, but just a set of variable bindings (if invoked
> in a FILTER in the middle of a query). Is "query solution" still a valid
> term for these intermediate 'solutions'?

Yes - functions are always evaluated in the context of a query solution 
(that's how variables get bound).  If a better wording would be clear, 
please suggest it.

> "and the same blank node for calls with the same simple literal within a
> FILTER for one solution mapping" -- is there a way to get the same bnode
> for the same simple literal *across* different solution mappings (across
> the entire query execution)?

No.  There is an argument for GLOBAL_BNODE(string) - if there is 
sufficient support, we can ad dit (with a better name).

Restructuring the query can also create the effect in some cases.

>  >> > > 16.4.21 STRDT
>  >> > > 16.4.22 STRLANG
> """
> STRLANG("chat", "en") "123"@en
> """
>
> -- should be "chat"@en, not "123"@en.

Done.

>  >> > > 16.4.23 NOT EXISTS and EXISTS
> "returns true/false depending on whether the pattern matches" -- Matches
> what? More detail would help explaining that it returns true if the
> evaluating the pattern (with variable substitution) results in any
> solutions.


[[
  the pattern matches the dataset
   given the bindings in the current query solution.
]]



>
> "The NOT EXISTS form translates into fn:not(EXISTS{...})." -- is there a
> reason to prefer this to "!(EXISTS{...})"?

I was bypassing the ! as syntax form as ! is fn:not.

>
> "Returns false if pattern pat matches the dataset."
> "Returns true if pattern pat matches the dataset."
> "Variables in the pattern pat"
> -- all of these refer to 'pat', but the texts they are referring to use
> 'pattern'.

pattern 'pat' -> 'pattern' e.g.
[[
<p>Returns <code>false</code> if <code>pattern</code> matches the dataset.
]]
>
> "Let μ a solution mapping" -- should be "Let μ *be* a solution mapping"

Done.

Thank you for the review.  Edits committed to CVS version 1.90

 Andy
Received on Friday, 24 September 2010 16:34:07 UTC