Re: syntax for the algebra - or "shortcuts" for subselect from Steve Harris on 2010-08-11 (public-rdf-dawg@w3.org from July to September 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 11 Aug 2010 11:31:17 +0100
To: Axel Polleres <axel.polleres@deri.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <16ECD3A8-D1A5-4CFC-BA94-97A275575C95@garlik.com>
For me the is in the same category as the proposed Update abbreviations. Apart from "BIND" there's no precedent in SQL, or existing SPARQL implementations, so I don't feel that it's wise to attempt to standardise something like this.

I can see no reason why it wouldn't work though, technically speaking.

- Steve

On 2010-08-11, at 10:50, Axel Polleres wrote:

> (sorry, previous message was unfinished)
> 
> Had this in my mind for a while... but didn't have a chance to write it down yet:
> Along the discussions around BIND, I am thinking about why only decoupling project expressions 
> but not also operators in the algebra that are syntactically bound to (sub)select at the moment, namely:
> 
> i) ORDER BY
> ii) LIMIT
> iii) Project expressions (also a recurring issue in the ongoing discussion about assignment, or BIND)
> iv) aggregates
> 
> All these have separate operators in the algebra, I think, but no stand-alone synatctic counterpart (i.e., without occuring in a (sub)SELECT)
> 
> I want to bring a - preliminary - proposal on the table to add own syntax for i)-iv) which: 
> - actually wouldn't really "add" syntax but rather should be viewed as shortcuts for current subselect queries 
> - BTW serves as a syntax proposal for BIND
> 
> Here we go:
> 
> (0) As a basement of defining the semantics of all this, it might make sense to base the whole evaluation semantics of patterns on solution sequences, 
>   rather than sets: the jumping back and forth between multisets and sequences (toList/toMultiset) IMO just complicates things, why not just go all 
>   the way with sequences and just say that in some cases the order is not deterministic or, resp., order may be lost during joins? 
> 
> (1) propose to add a syntactic operator:
>       Pattern ORDER BY <expr> 
>   with the semantics of ordering the solution sequence of Pattern according to the ORDER BY.
>   (I see no real reason, why I need a SELECT * around this to do a subquery that just does ordering)
> 
> (2) propose to add a new operator 
>       Pattern LIMIT number 
>   with the semantics of just limiting the solution sequence of Pattern to its first <number> elements.
>   ordering of the solution sequence of Pattern is preserved.
>   (I see no real reason, why I need a SELECT * around this to do a subquery that just does limiting)
> 
> (3) propose to add a new operator
>     Pattern BIND var AS expr
>   with the semantics of extending the solutions in the solution sequence of Pattern by the binding created in the assignment.
>   ordering of the solution sequence of Pattern is preserved.
> 
> (4) { Pattern } [GROUP BY vars] Agg(expr) AS expr
>   where Agg is an agregate function, with the semantics of grouping the solution sequence of Pattern according to 
>   the (optional) GROUP BY clause, and extending the solutions in the resulting grouped solution sequence by the binding created by the aggregation, 
>   the bindings for the grouped variables are lost/projected away in this. 
>   ordering of the solution sequence of Pattern is lost.
> 
> (5) Of course we'd also leave 
>      SELECT vars [WHERE] Pattern 
>    for projection.
>     ordering of the solution sequence of Pattern is preserved (or may be lost, not really sure what makes most sense here).
> 
> I think that the components for all 1)-5) are there in the algebra, but we have to tie each of these to a full subSELECT at the moment.
> 
> Here are some examples where this IMO could help:
> 
> A) from the current draft:
> 
> PREFIX : <http://people.example/>
> PREFIX : <http://people.example/>
> SELECT ?y ?minName
> WHERE {
>  :alice :knows ?y .
>  {
>    SELECT ?y (MIN(?name) AS ?minName)
>    WHERE {
>      ?y :name ?name .
>    } GROUP BY ?y
>  }
> }
> 
> could be written:
> 
> PREFIX : <http://people.example/>
> PREFIX : <http://people.example/>
> SELECT ?y ?minName
> WHERE {
>  :alice :knows ?y .
>  { ?y :name ?name . } GROUP BY ?y MIN(?name) AS ?minName }
> }
> 
> 
> B) from the test cases:
> 
> SELECT ?x ?max WHERE {
> {SELECT (max(?y) AS ?max) WHERE {?x ex:p ?y} } 
> ?x ex:p ?max
> }
> 
> could be written:
> 
> SELECT ?x ?max WHERE {
> { {?x ex:p ?y} max(?y) AS ?max } 
> ?x ex:p ?max
> }
> 
> C) "give me the publication titles for the top 3 among people with the most DBLP entries"    
> 
>  SELECT ?author ?title ?doc
>   FROM <dblp>
>   { { SELECT ?author (COUNT(?doc) as ?count) WHERE { ?doc dc:creator ?author } GROUP BY ?author 
>       ORDER BY ?count LIMIT 3 }
>     ?doc dc:creator ?author; dc:title ?title 
>   }
> 
> could be written:
> 
>   SELECT ?author ?title ?doc
>   FROM <dblp>
>   { { {{ ?doc dc:creator ?author } GROUP BY ?author COUNT(?doc) AS ?count} 
>      ORDER BY ?count LIMIT 3 }
>     ?doc dc:creator ?author; dc:title ?title 
>   }
> 
> 
> D) holger Knublauch's example query from 
>   http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Nov/0000.html
> 
> SELECT ?eMail ?image
> WHERE {
>  { { ?a a:email ?eMail .
>      ?a e:fullName ?fullName }
>    BIND  ?fullNameSpaceNormalized AS normalize-space(?fullName)              
>    BIND  ?firstName  AS substring-before(?fullNameSpaceNormalized," ") 
>    BIND  ?lastName=substring-after(?fullNameSpaceNormalized," ") }
>  { { ?b b:firstName ?firstName .
>      ?b b:lastName ?lastName .
>      ?b b:lastName ?altLastName . } 
>    BIND ?altName AS concat(?firstName, " ", ?altLastName )  }
>  { { ?c c:fullName ?altName .
>      ?c c:studyYears ?lengthOfCourse .
>      ?c c:matriculationDate ?matriculate . }
>    BIND ?endDate AS|year-from-date(add-yearMonthDuration-to-date(?matriculate,?lengthOfCourse)) }
>  { { ?d d:year ?endDate .
>    ?d d:fileName ?imageFile . }
>    BIND  ?image AS xs:anyURI(concat("http://www.example.org/photos", ?imageFile, ".jpg" ) ) }
> }
> 
> don't know whether that would make Holger/Jeremy happy, but it looks pretty close to the assign version)
> 
> Opinions/comments welcome, even if I won't fight for it, I wanted to bring this up before we close down completely for LC. 
> Especially, I'd be interested in opinions from the query editors whether they think it would require much effort? 
> Mainly, because I think that (0) could potentially simplify the definition of the algebra, but also mean considerable effort to be implemented.
> Let me also emphasise that 3) could be probably viewed independent from adopting 0), 1), 2), and 4) anyways... 
> 
> Axel
> 
> 1. http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra Definition:Diff

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 11 August 2010 10:31:51 UTC