- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 11 Aug 2010 11:31:17 +0100
- To: Axel Polleres <axel.polleres@deri.org>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
For me the is in the same category as the proposed Update abbreviations. Apart from "BIND" there's no precedent in SQL, or existing SPARQL implementations, so I don't feel that it's wise to attempt to standardise something like this. I can see no reason why it wouldn't work though, technically speaking. - Steve On 2010-08-11, at 10:50, Axel Polleres wrote: > (sorry, previous message was unfinished) > > Had this in my mind for a while... but didn't have a chance to write it down yet: > Along the discussions around BIND, I am thinking about why only decoupling project expressions > but not also operators in the algebra that are syntactically bound to (sub)select at the moment, namely: > > i) ORDER BY > ii) LIMIT > iii) Project expressions (also a recurring issue in the ongoing discussion about assignment, or BIND) > iv) aggregates > > All these have separate operators in the algebra, I think, but no stand-alone synatctic counterpart (i.e., without occuring in a (sub)SELECT) > > I want to bring a - preliminary - proposal on the table to add own syntax for i)-iv) which: > - actually wouldn't really "add" syntax but rather should be viewed as shortcuts for current subselect queries > - BTW serves as a syntax proposal for BIND > > Here we go: > > (0) As a basement of defining the semantics of all this, it might make sense to base the whole evaluation semantics of patterns on solution sequences, > rather than sets: the jumping back and forth between multisets and sequences (toList/toMultiset) IMO just complicates things, why not just go all > the way with sequences and just say that in some cases the order is not deterministic or, resp., order may be lost during joins? > > (1) propose to add a syntactic operator: > Pattern ORDER BY <expr> > with the semantics of ordering the solution sequence of Pattern according to the ORDER BY. > (I see no real reason, why I need a SELECT * around this to do a subquery that just does ordering) > > (2) propose to add a new operator > Pattern LIMIT number > with the semantics of just limiting the solution sequence of Pattern to its first <number> elements. > ordering of the solution sequence of Pattern is preserved. > (I see no real reason, why I need a SELECT * around this to do a subquery that just does limiting) > > (3) propose to add a new operator > Pattern BIND var AS expr > with the semantics of extending the solutions in the solution sequence of Pattern by the binding created in the assignment. > ordering of the solution sequence of Pattern is preserved. > > (4) { Pattern } [GROUP BY vars] Agg(expr) AS expr > where Agg is an agregate function, with the semantics of grouping the solution sequence of Pattern according to > the (optional) GROUP BY clause, and extending the solutions in the resulting grouped solution sequence by the binding created by the aggregation, > the bindings for the grouped variables are lost/projected away in this. > ordering of the solution sequence of Pattern is lost. > > (5) Of course we'd also leave > SELECT vars [WHERE] Pattern > for projection. > ordering of the solution sequence of Pattern is preserved (or may be lost, not really sure what makes most sense here). > > I think that the components for all 1)-5) are there in the algebra, but we have to tie each of these to a full subSELECT at the moment. > > Here are some examples where this IMO could help: > > A) from the current draft: > > PREFIX : <http://people.example/> > PREFIX : <http://people.example/> > SELECT ?y ?minName > WHERE { > :alice :knows ?y . > { > SELECT ?y (MIN(?name) AS ?minName) > WHERE { > ?y :name ?name . > } GROUP BY ?y > } > } > > could be written: > > PREFIX : <http://people.example/> > PREFIX : <http://people.example/> > SELECT ?y ?minName > WHERE { > :alice :knows ?y . > { ?y :name ?name . } GROUP BY ?y MIN(?name) AS ?minName } > } > > > B) from the test cases: > > SELECT ?x ?max WHERE { > {SELECT (max(?y) AS ?max) WHERE {?x ex:p ?y} } > ?x ex:p ?max > } > > could be written: > > SELECT ?x ?max WHERE { > { {?x ex:p ?y} max(?y) AS ?max } > ?x ex:p ?max > } > > C) "give me the publication titles for the top 3 among people with the most DBLP entries" > > SELECT ?author ?title ?doc > FROM <dblp> > { { SELECT ?author (COUNT(?doc) as ?count) WHERE { ?doc dc:creator ?author } GROUP BY ?author > ORDER BY ?count LIMIT 3 } > ?doc dc:creator ?author; dc:title ?title > } > > could be written: > > SELECT ?author ?title ?doc > FROM <dblp> > { { {{ ?doc dc:creator ?author } GROUP BY ?author COUNT(?doc) AS ?count} > ORDER BY ?count LIMIT 3 } > ?doc dc:creator ?author; dc:title ?title > } > > > D) holger Knublauch's example query from > http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Nov/0000.html > > SELECT ?eMail ?image > WHERE { > { { ?a a:email ?eMail . > ?a e:fullName ?fullName } > BIND ?fullNameSpaceNormalized AS normalize-space(?fullName) > BIND ?firstName AS substring-before(?fullNameSpaceNormalized," ") > BIND ?lastName=substring-after(?fullNameSpaceNormalized," ") } > { { ?b b:firstName ?firstName . > ?b b:lastName ?lastName . > ?b b:lastName ?altLastName . } > BIND ?altName AS concat(?firstName, " ", ?altLastName ) } > { { ?c c:fullName ?altName . > ?c c:studyYears ?lengthOfCourse . > ?c c:matriculationDate ?matriculate . } > BIND ?endDate AS|year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse)) } > { { ?d d:year ?endDate . > ?d d:fileName ?imageFile . } > BIND ?image AS xs:anyURI(concat("http://www.example.org/photos", ?imageFile, ".jpg" ) ) } > } > > don't know whether that would make Holger/Jeremy happy, but it looks pretty close to the assign version) > > Opinions/comments welcome, even if I won't fight for it, I wanted to bring this up before we close down completely for LC. > Especially, I'd be interested in opinions from the query editors whether they think it would require much effort? > Mainly, because I think that (0) could potentially simplify the definition of the algebra, but also mean considerable effort to be implemented. > Let me also emphasise that 3) could be probably viewed independent from adopting 0), 1), 2), and 4) anyways... > > Axel > > 1. http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra Definition:Diff -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 11 August 2010 10:31:51 UTC