From: Birte Glimm <birte.glimm@uni-ulm.de>

Date: Tue, 13 Dec 2011 20:28:57 +0100

Message-ID: <CABt65OdXkkQ0XmzR86F=HP-sFrFL8aYu9WRhN3TQo=ty-zCH=g@mail.gmail.com>

To: Andy Seaborne <andy.seaborne@epimorphics.com>, Steve Harris <steve.harris@garlik.com>, SPARQL Working Group <public-rdf-dawg@w3.org>

Date: Tue, 13 Dec 2011 20:28:57 +0100

Message-ID: <CABt65OdXkkQ0XmzR86F=HP-sFrFL8aYu9WRhN3TQo=ty-zCH=g@mail.gmail.com>

To: Andy Seaborne <andy.seaborne@epimorphics.com>, Steve Harris <steve.harris@garlik.com>, SPARQL Working Group <public-rdf-dawg@w3.org>

Hi Andy, Steve, all, here comes the second part of my Query review for Sec. 18 onwards. Best regards, Birte 18.2.2.2 uses translate(...) whereas all later transformation algorithms use Transform(...) is that intended or should this rather be unified? 18.2.4.1 Grouping and Aggregation, currently says: For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q If X contains an unaggregated variable V Replace V with Sample(V) End I wonder whether we should replace the If statement with a For loop, to better handle the cse where we have multiple unaggregated variables in X, e.g., ((?x + ?y) AS ?z) with ?x and ?y unaggregated requires sampling them both: For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q For each unaggregated variable V in X Replace V with Sample(V) End 18.2.4.4: Note, E s a list of pairs of the form (variable, expression), defined in section 18.2.4 E *i*s a list... 18.4 This might not come as a surprise, but I am (and always was) unhappy about the exist filters. In particular, we have in 18.4 a definition of Filter that has as third parameter the active graph, but this is not used in the definition. Instead, there is a kind of footnote pointing to the evaluation semantics for exists(pattern) filters. Either we have a function Filter that is defined for all SPARQL filters or we don't. I suppose we should have such a function. The problem is, that at this point the evaluation semantics can no longer be separated from the algebra functions. If we were honest here, we would define Filter somehow like this: Definition: Filter Let Ω be a multiset of solution mappings, expr an expression, and G an RDF graph. Filter(expr, Ω, G) = { μ | μ in Ω and expr(μ) is an expression that has an effective boolean value of true given that each subexpression of expr of the form exists(A) evaluates to true if and only if eval(D(G), substitute(A, μ)) is a non-empty sequence } There is no problem with using substitute here, which I would anyway move to 18.4, but having to delegate to eval here is not nice. I don't see a solution for this, but prefer a more honest definition here that shows that we now mix algebra functions with evaluation semantics. If we change to the above definition, substitute moves to 18.4 and Definition: Evaluation of Exists is obsolete since it is covered by the filter definition. Definition: Extend ... Extend(Ω , var, term) = { Extend(μ, var, term) | μ in Ω } I assume term should be expr as in the lines before. If, for some reason beyond my understanding, term is meant, it should be defined above (as mu, Omega, var and expr), I guess as an RDF term. 18.4.1 Note that in 18.2.4.1, we get Group((1), P) if the query has an aggregate, but no GROUP BY clause. Thus, Group((1), P) should produce something meaningful in 18.4.1, but we define Group(exprlist, Ω) = { ListEval(exprlist, μ) -> { μ' | μ' in Ω, ListEval(exprlist, μ) = ListEval(exprlist, μ') } | μ in Ω } which is Group((1), Ω) = { ListEval((1), μ) -> { μ' | μ' in Ω, ListEval((1), μ) = ListEval((1), μ') } | μ in Ω } with Ω the solutions for P and ListEvalListEval((expr1, ..., exprn), μ) returns a list (e1, ..., en), where ei = μ(expri) or error. which is ListEvalListEval((1), μ) returns a list (e1), where e1 = μ(1) or error. What is μ(1) here? I don't think implicit grouping is mean to work based on errors. All in all, I don't understand how implicit grouping is supposed to work. Definition: Aggregation Aggregation, a function which calculates a scalar value as an output of the aggregate expression in the SELECT clause, in the HAVING evaluation process, and in ORDER BY where required is used to calculate aggregated values over groups of solutions. I don't understand "where required is used to calculate aggregated values over groups of solutions." The main part of the definition is still in 18.5 and, to be consistent with the rest, should be moved to 18.4.1. For example, in 18.4.1, you could have the def. from 18.5 with slight modifications: Definition: Aggregation Let exprlist be a list of expressions or *, func an aggregate, scalarvals a set of partial functions (possibly empty) passed from the aggregate in the query, and let { key_1->Omega_1, ..., key_m->Omega_m } be a multiset of partial functions from keys to solution sequences as produced by the grouping step. Aggregation applies the set function func to the given multiset and produces a single value for each key and partition of solutions for that key. Aggregation( exprlist, func, scalarvals, { key_1->Omega_1, ..., key_m->Omega_m } ) = { (dom(g), F) | g in { key_1->Omega_1, ..., key_m->Omega_m } } where M = { ListEval(exprlist, μ) | μ in range(g) } F = func(M, scalarvals), for non-DISTINCT F = func(Distinct(M), scalarvals), for DISTINCT Special Case: when COUNT is used with the expression * the value of F will be the cardinality of the group solution sequence, card[range(g)], or card[Distinct(range(g))] if the DISTINCT keyword is present. I would take the following outside of the definition: "scalarvals" are used to pass values to the underlying set function, bypassing the mechanics of the grouping. For example, the aggregate expression GROUP_CONCAT(?x ; separator="|") has a scalarvals argument of { "separator" -> "|" }. All aggregates may have the DISTINCT keyword as the first token in their argument list. If this keyword is present then first argument to func is Distinct(M). Note that I have replaced eval(D(G), P) with a more explicit data structure. We can, anyway, not work with eval(...) in 18.4, but have to define the function for a certain given data structure. I would even suggest to go further and define this more explicitly (using keys and the solution sets Omega). Aggregation( exprlist, func, scalarvals, { key_1->Omega_1, ..., key_m->Omega_m } ) = { (key, F(Omega)) | key->Omega in { key_1->Omega_1, ..., key_m->Omega_m } } where M(Omega) = { ListEval(exprlist, μ) | μ in Omega } F(Omega) = func(M(Omega), scalarvals), for non-DISTINCT F(Omega) = func(Distinct(M(Omega)), scalarvals), for DISTINCT I always found the g quite confusing in the current definition. In 18.5 we then just have: Definition: Evaluation of Aggregation eval(D(G), Aggregation(exprlist, func, scalarvals, P)) = Aggregation(exprlist, func, scalarvals, eval(D(G), P)) I would also fully define AggregateJoin in 18.4.1, i.e.: Definition: AggregateJoin Let S_1, ..., S_n be a list of sets, where each set S_i contains key to (aggregated) value maps as produced by Aggregate. Let K = { key | key in dom(S_j) for some 1 <= j <= n } be the set of keys, then AggregateJoin(S_1, ..., S_n) = { agg_1->val_1, ..., agg_n->val_n | key in K and key->val_i in S_i for each 1 <= i <= n } I also define the set of keys beforehand since I find it easier to see that each constructed mappings results from one key. In 18.5 you can then define Definition: Evaluation of AggregateJoin eval(D(G), AggregateJoin(A_1, ..., A_n)) = AggregateJoin(eval(D(G), A_1), ..., eval(D(G), A_n)) This way, aggregation is hadled as all the other functions, i.e., define an abstract function to which eval delegates. 18.4.1.1 Set Functions The only use of this that's supported by the built-in aggregates I would spell out that is instead of that's I find the definition names in 18.5 not very uniform. We have (picking some): Definition: Evaluation of Filter(F, P) Definition: Evaluation of Exists Definition: Evaluation of LeftJoin(P1, P2, F) Definition: Evaluation of a Union Pattern Why not Evaluation of a Union(P1, P2) or just Evaluation of LeftJoin? 18.6 As Axel pointed out that the first condition for ent. regimes should use ***E-***consistent and we also extended this part to say that the scoping graph is uniquely specified *up to RDF graph equivalence*. Thus Axel suggest to use in both Query and Ent. Regimes: 1. The scoping graph, SG, corresponding to any E-consistent active graph AG is uniquely specified up to <a href="http://www.w3.org/TR/rdf-concepts/#section-graph-equality">RDF graph equivalence</a> and is E-equivalent to AG. I am happy to use this in Ent. Regimes. -- Jun. Prof. Dr. Birte Glimm Tel.: +49 731 50 24125 Inst. of Artificial Intelligence Secr: +49 731 50 24258 University of Ulm Fax: +49 731 50 24188 D-89069 Ulm birte.glimm@uni-ulm.de GermanyReceived on Tuesday, 13 December 2011 19:32:00 GMT

*
This archive was generated by hypermail 2.3.1
: Tuesday, 26 March 2013 16:15:47 GMT
*