- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 11 Aug 2010 11:31:17 +0100
- To: Axel Polleres <axel.polleres@deri.org>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
For me the is in the same category as the proposed Update abbreviations. Apart from "BIND" there's no precedent in SQL, or existing SPARQL implementations, so I don't feel that it's wise to attempt to standardise something like this.
I can see no reason why it wouldn't work though, technically speaking.
- Steve
On 2010-08-11, at 10:50, Axel Polleres wrote:
> (sorry, previous message was unfinished)
>
> Had this in my mind for a while... but didn't have a chance to write it down yet:
> Along the discussions around BIND, I am thinking about why only decoupling project expressions
> but not also operators in the algebra that are syntactically bound to (sub)select at the moment, namely:
>
> i) ORDER BY
> ii) LIMIT
> iii) Project expressions (also a recurring issue in the ongoing discussion about assignment, or BIND)
> iv) aggregates
>
> All these have separate operators in the algebra, I think, but no stand-alone synatctic counterpart (i.e., without occuring in a (sub)SELECT)
>
> I want to bring a - preliminary - proposal on the table to add own syntax for i)-iv) which:
> - actually wouldn't really "add" syntax but rather should be viewed as shortcuts for current subselect queries
> - BTW serves as a syntax proposal for BIND
>
> Here we go:
>
> (0) As a basement of defining the semantics of all this, it might make sense to base the whole evaluation semantics of patterns on solution sequences,
> rather than sets: the jumping back and forth between multisets and sequences (toList/toMultiset) IMO just complicates things, why not just go all
> the way with sequences and just say that in some cases the order is not deterministic or, resp., order may be lost during joins?
>
> (1) propose to add a syntactic operator:
> Pattern ORDER BY <expr>
> with the semantics of ordering the solution sequence of Pattern according to the ORDER BY.
> (I see no real reason, why I need a SELECT * around this to do a subquery that just does ordering)
>
> (2) propose to add a new operator
> Pattern LIMIT number
> with the semantics of just limiting the solution sequence of Pattern to its first <number> elements.
> ordering of the solution sequence of Pattern is preserved.
> (I see no real reason, why I need a SELECT * around this to do a subquery that just does limiting)
>
> (3) propose to add a new operator
> Pattern BIND var AS expr
> with the semantics of extending the solutions in the solution sequence of Pattern by the binding created in the assignment.
> ordering of the solution sequence of Pattern is preserved.
>
> (4) { Pattern } [GROUP BY vars] Agg(expr) AS expr
> where Agg is an agregate function, with the semantics of grouping the solution sequence of Pattern according to
> the (optional) GROUP BY clause, and extending the solutions in the resulting grouped solution sequence by the binding created by the aggregation,
> the bindings for the grouped variables are lost/projected away in this.
> ordering of the solution sequence of Pattern is lost.
>
> (5) Of course we'd also leave
> SELECT vars [WHERE] Pattern
> for projection.
> ordering of the solution sequence of Pattern is preserved (or may be lost, not really sure what makes most sense here).
>
> I think that the components for all 1)-5) are there in the algebra, but we have to tie each of these to a full subSELECT at the moment.
>
> Here are some examples where this IMO could help:
>
> A) from the current draft:
>
> PREFIX : <http://people.example/>
> PREFIX : <http://people.example/>
> SELECT ?y ?minName
> WHERE {
> :alice :knows ?y .
> {
> SELECT ?y (MIN(?name) AS ?minName)
> WHERE {
> ?y :name ?name .
> } GROUP BY ?y
> }
> }
>
> could be written:
>
> PREFIX : <http://people.example/>
> PREFIX : <http://people.example/>
> SELECT ?y ?minName
> WHERE {
> :alice :knows ?y .
> { ?y :name ?name . } GROUP BY ?y MIN(?name) AS ?minName }
> }
>
>
> B) from the test cases:
>
> SELECT ?x ?max WHERE {
> {SELECT (max(?y) AS ?max) WHERE {?x ex:p ?y} }
> ?x ex:p ?max
> }
>
> could be written:
>
> SELECT ?x ?max WHERE {
> { {?x ex:p ?y} max(?y) AS ?max }
> ?x ex:p ?max
> }
>
> C) "give me the publication titles for the top 3 among people with the most DBLP entries"
>
> SELECT ?author ?title ?doc
> FROM <dblp>
> { { SELECT ?author (COUNT(?doc) as ?count) WHERE { ?doc dc:creator ?author } GROUP BY ?author
> ORDER BY ?count LIMIT 3 }
> ?doc dc:creator ?author; dc:title ?title
> }
>
> could be written:
>
> SELECT ?author ?title ?doc
> FROM <dblp>
> { { {{ ?doc dc:creator ?author } GROUP BY ?author COUNT(?doc) AS ?count}
> ORDER BY ?count LIMIT 3 }
> ?doc dc:creator ?author; dc:title ?title
> }
>
>
> D) holger Knublauch's example query from
> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Nov/0000.html
>
> SELECT ?eMail ?image
> WHERE {
> { { ?a a:email ?eMail .
> ?a e:fullName ?fullName }
> BIND ?fullNameSpaceNormalized AS normalize-space(?fullName)
> BIND ?firstName AS substring-before(?fullNameSpaceNormalized," ")
> BIND ?lastName=substring-after(?fullNameSpaceNormalized," ") }
> { { ?b b:firstName ?firstName .
> ?b b:lastName ?lastName .
> ?b b:lastName ?altLastName . }
> BIND ?altName AS concat(?firstName, " ", ?altLastName ) }
> { { ?c c:fullName ?altName .
> ?c c:studyYears ?lengthOfCourse .
> ?c c:matriculationDate ?matriculate . }
> BIND ?endDate AS|year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse)) }
> { { ?d d:year ?endDate .
> ?d d:fileName ?imageFile . }
> BIND ?image AS xs:anyURI(concat("http://www.example.org/photos", ?imageFile, ".jpg" ) ) }
> }
>
> don't know whether that would make Holger/Jeremy happy, but it looks pretty close to the assign version)
>
> Opinions/comments welcome, even if I won't fight for it, I wanted to bring this up before we close down completely for LC.
> Especially, I'd be interested in opinions from the query editors whether they think it would require much effort?
> Mainly, because I think that (0) could potentially simplify the definition of the algebra, but also mean considerable effort to be implemented.
> Let me also emphasise that 3) could be probably viewed independent from adopting 0), 1), 2), and 4) anyways...
>
> Axel
>
> 1. http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra Definition:Diff
--
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203 http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 11 August 2010 10:31:51 UTC