Re: Query review, part 2 (ACTION-546) from Gregory Williams on 2011-12-08 (public-rdf-dawg@w3.org from October to December 2011)

From: Gregory Williams <greg@evilfunhouse.com>
Date: Thu, 8 Dec 2011 11:43:01 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-dawg@w3.org
Message-Id: <09746391-23F9-472E-8E36-79EF0367610E@evilfunhouse.com>
Andy,

Thanks for the response. I've commented on a few of the issues below, and am happy with your responses to the rest.


On Dec 8, 2011, at 10:45 AM, Andy Seaborne wrote:

>> === 18.2
>> 
>> "Property path expressions are written to produce triple patterns and introduce four forms, ZeroLengthPath, ZeroOrMorePath, OneOrMorePath, and NegatedPropertySet."
>> I don't really understand this. "to produce triple patterns" sounds like a discussion of only fixed-length property paths, but the "forms" discussed are for property paths that aren't simply equivalent to triple patterns. Also, it's not clear what "form" is meant to mean here as the subsequent text calls them "symbols in the SPARQL algebra."
> 
> Is this better?
> 
> "Property path expressions are written to produce triple patterns
> and algebra forms, ZeroLengthPath, ZeroOrMorePath, OneOrMorePath,
> and NegatedPropertySet as necessary."

Yes.

>> Some of the in-scope table entries seem to describe the *condition* for when the variable is in-scope (such as when "v occurs in the BGP"), but others seem to simply describe the in-scope rule:
>> - "v is in-scope" for the "(expr AS v)" form
>> - "v is in-scope if v is mentioned as a project variable" for the "SELECT ..v.. { P }" form
>> - "v is in-scope if v is in varlist" for the "BINDINGS varlist (values)" form
> 
> Do you see this as a problem?
> 
> It would be nice to write in rule form but writing that for a BGP is going to be verbose.
> 
> To convert would merely to move the v to the other column and have
> 
> "(expr AS X) for .."  => "v is in-scope if v = X"

No, not really a "problem." I just found it distracting while reading.

>> "The second form of a rewrite example is the first with empty group joins removed by the simplification step."
>> I'm not sure I understand this sentence.
> 
> It shows the before-and-after of simpification.
> 
> Do you have better wording to suggest?

@@

>> "BGP( ?s :p1 ?v1 .?s :p2 ?v2 )"
>> The whitespace is odd in this syntax, but I'm more curious about the choice of '.' as a separator for triples in the serialization of the BGP algebra.
> 
> Whitespace fixed.
> 
> DOT separated triples in SPARQL syntax so I left it in to show the division of triples.

Ah. I didn't realize that what shows up inside a BGP(…) is sparql syntax. I expected a list of triples, so probably comma separated.

>> === 18.2.4.1
>> 
>> "If the GROUP BY keyword is used, or there is implicit grouping due to the use of aggregates in the projection..."
>> Is it possible to have an implicit grouping based on the use of aggregates in only the HAVING clause, and not the projection?
> 
> Yes and no.
> 
> Observation: if you have just "HAVING aggregate", then there is no possible legal SELECT clause.
> 
> Only GROUP BY variables and aggregates would be legal
> There aren't any GROUP BY variables and you are askign about none in projection.
> 
> SELECT * is illegal if there is an aggregate (implicit group).
> 
> So I hope the answer is "yes" and it just falls out there are no legal queries.

It's a bit perverse, but wouldn't "SELECT (1 AS ?one)" be a valid projection for such a query?


>> "variable must not appear in VS; if it does then generate a syntax error and stop"
>> I think this should also prevent the variable from appearing in P (the list of already projected variables).
> 
> VS is defined as the variables in the { pattern } from above.

I'm not sure that actually addresses my concern. Shouldn't this address the case where the same variable appears in two (expr AS variable) selItems? On the first one, (variable, expr) is appended to E. On the second one, I'd expect that the restriction preventing the variable from appearing in the pattern *should also* prevent the variable from appearing in previously handled selItems, and similarly generate a syntax error.

>> === 18.4
>> 
>> "Write [x | C] for a sequence of elements where C(x) is true."
>> I take it this is trying to introduce the list equivalent to the established use of {x | C} for sets? If so, I'm not sure "C(x)" makes sense when this syntax is used, e.g. in "OrderBy(Ψ, condition) = [ μ | μ in Ψ and the sequence satisfies the ordering condition]".
> 
> Yes - it's sequence notation.
> 
> Is this better?
> 
> "Write [ x | C ] for a sequence of elements where C is a condition on x."

Yes.

>> 
>> The 'term path term' form is defined as:
>> """
>> eval(D(G), ZeroOrMore(x:term, path, y:term)) =
>>     { { } } if (x,vy:var) in eval(D(G), ZeroOrMore(x, path, vy); card[{ }] = 1
>> """
>> I don't understand this formulation, as I understand eval(D(G), ZeroOrMore(...)) as returning multisets of (var, term) pairs, but this seems to be looking for a (term, var) pair. Why isn't this as simple as "{ {} } if y in ALP(x, path), card[] = 1" (the opposite of the negative case which returns the empty multiset)?
> 
> I think "in" is clear to mean a pair in the multiset of sets but I've added {} round.  Does that help?

No, the original syntax without the braces was fine. My point was that the formulation seems to be looking for a (term, var) element, but eval() returns a set of (var, term) elements. They're reversed. But the solution probably isn't as simple as reversing the syntax, because then you'd be looking at the path backwards...

>> As above, the definition for the 'term path term' form seems to be looking for a (term, var) pair in the return from eval(D(G), OneOrMore(...)).
> 
> As above.

Again. :)

>> === 18.5 (Definition: Evaluation of NegatedPropertySet)
>> 
>> As above in 18.4, I'm not sure how to interpret the syntactic form "μ'(μ,x)", nor what exactly μ should contain in this definition (if anything) beyond mappings for x and y.
> 
> I've added an explicit explanation of μ'
> 
> μ'(μ,x) = μ(x) if x is a variable
> μ'(μ,t) = t if t is a term

I understand it now, but still get a bit hung up on the phrase "the extension of a solution mapping". Not sure there might be better wording, though.

>> 
>> === 18.6.1
>> 
>> "SG will often be graph equivalent to AG, but restricting this to E-equivalence allows some forms of normalization, for example elimination of semantic redundancies, to be applied to the source documents before querying."
>> I'm not sure what "source documents" means here. What I think I understand from this is an indication that the entailment might eliminate redundancies in the underlying RDF, but while that's true, I think it's also true of any SPARQL system insofar as SPARQL Query only discusses query evaluation *after* data is somehow populated in the working dataset. In fact, it may be the case that there never is a "source document," as the RDF may be input (and redundancies eliminated) directly via an API.
>> 
>> "This allows query protocols in which blank node identifiers retain their meaning between the query and the source document, or across multiple queries."
>> Again regarding "source document."
> 
> In RDF, there is only syntax in documents.  It does nto really consider API construction of data.

Even if that's true, I think the point still remains. Maybe I'm just thinking about this wrong, but when I hear "elimination of semantic redundancies", I expect that to happen *after* the RDF is parsed from a source file. The system doesn't twiddle the bytes of the file on disk when it's "eliminating semantic redundancies." It's done either in the process of between the document and a graph store, or as an in-place update to the graph store (or, I suppose, at query time).

>  I think tha's why it talks about "source documents"; it's trying to not use the word "graph".
> 
> I'm not sure how the wording here could be changed. Any suggestions?

Not at the moment. I'll give it some thought.

thanks,
.greg
Received on Thursday, 8 December 2011 16:43:34 UTC