- From: Axel Polleres <axel.polleres@deri.org>
- Date: Wed, 11 Aug 2010 10:50:59 +0100
- To: SPARQL Working Group <public-rdf-dawg@w3.org>
(sorry, previous message was unfinished) Had this in my mind for a while... but didn't have a chance to write it down yet: Along the discussions around BIND, I am thinking about why only decoupling project expressions but not also operators in the algebra that are syntactically bound to (sub)select at the moment, namely: i) ORDER BY ii) LIMIT iii) Project expressions (also a recurring issue in the ongoing discussion about assignment, or BIND) iv) aggregates All these have separate operators in the algebra, I think, but no stand-alone synatctic counterpart (i.e., without occuring in a (sub)SELECT) I want to bring a - preliminary - proposal on the table to add own syntax for i)-iv) which: - actually wouldn't really "add" syntax but rather should be viewed as shortcuts for current subselect queries - BTW serves as a syntax proposal for BIND Here we go: (0) As a basement of defining the semantics of all this, it might make sense to base the whole evaluation semantics of patterns on solution sequences, rather than sets: the jumping back and forth between multisets and sequences (toList/toMultiset) IMO just complicates things, why not just go all the way with sequences and just say that in some cases the order is not deterministic or, resp., order may be lost during joins? (1) propose to add a syntactic operator: Pattern ORDER BY <expr> with the semantics of ordering the solution sequence of Pattern according to the ORDER BY. (I see no real reason, why I need a SELECT * around this to do a subquery that just does ordering) (2) propose to add a new operator Pattern LIMIT number with the semantics of just limiting the solution sequence of Pattern to its first <number> elements. ordering of the solution sequence of Pattern is preserved. (I see no real reason, why I need a SELECT * around this to do a subquery that just does limiting) (3) propose to add a new operator Pattern BIND var AS expr with the semantics of extending the solutions in the solution sequence of Pattern by the binding created in the assignment. ordering of the solution sequence of Pattern is preserved. (4) { Pattern } [GROUP BY vars] Agg(expr) AS expr where Agg is an agregate function, with the semantics of grouping the solution sequence of Pattern according to the (optional) GROUP BY clause, and extending the solutions in the resulting grouped solution sequence by the binding created by the aggregation, the bindings for the grouped variables are lost/projected away in this. ordering of the solution sequence of Pattern is lost. (5) Of course we'd also leave SELECT vars [WHERE] Pattern for projection. ordering of the solution sequence of Pattern is preserved (or may be lost, not really sure what makes most sense here). I think that the components for all 1)-5) are there in the algebra, but we have to tie each of these to a full subSELECT at the moment. Here are some examples where this IMO could help: A) from the current draft: PREFIX : <http://people.example/> PREFIX : <http://people.example/> SELECT ?y ?minName WHERE { :alice :knows ?y . { SELECT ?y (MIN(?name) AS ?minName) WHERE { ?y :name ?name . } GROUP BY ?y } } could be written: PREFIX : <http://people.example/> PREFIX : <http://people.example/> SELECT ?y ?minName WHERE { :alice :knows ?y . { ?y :name ?name . } GROUP BY ?y MIN(?name) AS ?minName } } B) from the test cases: SELECT ?x ?max WHERE { {SELECT (max(?y) AS ?max) WHERE {?x ex:p ?y} } ?x ex:p ?max } could be written: SELECT ?x ?max WHERE { { {?x ex:p ?y} max(?y) AS ?max } ?x ex:p ?max } C) "give me the publication titles for the top 3 among people with the most DBLP entries" SELECT ?author ?title ?doc FROM <dblp> { { SELECT ?author (COUNT(?doc) as ?count) WHERE { ?doc dc:creator ?author } GROUP BY ?author ORDER BY ?count LIMIT 3 } ?doc dc:creator ?author; dc:title ?title } could be written: SELECT ?author ?title ?doc FROM <dblp> { { {{ ?doc dc:creator ?author } GROUP BY ?author COUNT(?doc) AS ?count} ORDER BY ?count LIMIT 3 } ?doc dc:creator ?author; dc:title ?title } D) holger Knublauch's example query from http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Nov/0000.html SELECT ?eMail ?image WHERE { { { ?a a:email ?eMail . ?a e:fullName ?fullName } BIND ?fullNameSpaceNormalized AS normalize-space(?fullName) BIND ?firstName AS substring-before(?fullNameSpaceNormalized," ") BIND ?lastName=substring-after(?fullNameSpaceNormalized," ") } { { ?b b:firstName ?firstName . ?b b:lastName ?lastName . ?b b:lastName ?altLastName . } BIND ?altName AS concat(?firstName, " ", ?altLastName ) } { { ?c c:fullName ?altName . ?c c:studyYears ?lengthOfCourse . ?c c:matriculationDate ?matriculate . } BIND ?endDate AS|year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse)) } { { ?d d:year ?endDate . ?d d:fileName ?imageFile . } BIND ?image AS xs:anyURI(concat("http://www.example.org/photos", ?imageFile, ".jpg" ) ) } } don't know whether that would make Holger/Jeremy happy, but it looks pretty close to the assign version) Opinions/comments welcome, even if I won't fight for it, I wanted to bring this up before we close down completely for LC. Especially, I'd be interested in opinions from the query editors whether they think it would require much effort? Mainly, because I think that (0) could potentially simplify the definition of the algebra, but also mean considerable effort to be implemented. Let me also emphasise that 3) could be probably viewed independent from adopting 0), 1), 2), and 4) anyways... Axel 1. http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra Definition:Diff
Received on Wednesday, 11 August 2010 09:51:31 UTC