[sub-select] Some examples and discussion

In the telecom yesterday, the chair suggested progressing technical discussions to the mailing list.  Here's some more examples for sub-selects and some discussion points.


A SPARQL query is a graph pattern and solution modifiers. The graph pattern is the part of the SELECT query string in the {} and becomes the algebra operations of BGP, Join, LeftJoin, Filter, Union and Graph.  The modifiers are toList, order by, project, distinct/reduced, slice (offset & limit).

Because a query is evaluated as pattern then modifiers, the modifiers don't feedback into the pattern matching.  What a sub-select does is allow the pattern part to be another SELECT query which means the step applied after graph pattern matching can now be used before some other graph pattern matching occurs.  Sub-selects on their own don't add anything because there aren't any interesting modifiers (no variable binding). In SPARQL/2008 query can only return RDF terms that were found as the result of BGP matching - there's no other way to bind a variable to a value.

But several features we may choose to work on do add the capability of introducing new values and binding them -- project expressions [F1] and aggregate functions [F2].

Project expressions and sub-select can be used to bind a value to a variable [F3] which can later be used in a graph pattern including a FILTER:

## Find people over 5 foot 8 inches, from their height in centimetres
PREFIX  foaf:   <http://xmlns.com/foaf/0.1/>
PREFIX : <http://example/>

SELECT ?name ?inch
{
   { SELECT ?name ( ?cm/2.54 AS ?inch )
     {
       ?x foaf:name ?name .
       ?x :height ?cm .
     }
   }
   FILTER (?inch > 68)
}

This is the same as the assignment [F3] example:

## Assignment example
PREFIX  foaf:   <http://xmlns.com/foaf/0.1/>
PREFIX : <http://example/>

SELECT ?name ?inch
{
    ?x foaf:name ?name .
    ?x :height ?cm .
    LET ( ?inch := ?cm/2.54 )
    FILTER (?inch > 68)
}

It does not have to be a FILTER: 

## Find people with the same height across two graphs,
## converting centimetres to inches:

SELECT ?name1 ?name2
{
   GRAPH <friendsEU>
   { SELECT ?name ( ?cm/2.54 AS ?inch )
     {
       ?x foaf:name ?name2 .
       ?x :height ?cm .
     }
   }
   GRAPH <friendsUS>
   {
     ?y :heightInInches ?inch .
     ?y foaf:name ?name2
   }
}

One way to think of this is that the sub-select captures some concept that the application writer wants to reuse within a query which is why sub-select is most useful when combined with a feature that can produce new terms for matching and testing.  

This query finds people who know 10 or more other people and returns the RDF term for them.

## Find people who know 10 or more others
PREFIX  :     <http://example/>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

SELECT  ?x
WHERE
  { ?x foaf:knows ?z}
GROUP BY ?x
HAVING (count(*) >= 10 )

using sub-select, the returned collection of resources for people can be used further
(like all these examples, there may be other ways of addressing the task - they are just examples)

## Names of people 
PREFIX  :     <http://example/>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

SELECT  ?x ?n
WHERE
  { ?x foaf:name ?name
    { SELECT  ?x
      WHERE
        { ?x foaf:knows ?z}
      GROUP BY ?x
      HAVING ( count(*) >= 10 )
    }
  }


Spec costs:

Sub-select's would cost little in terms of spec changes.  That's not the same as implementation costs.

The features that make sub-selects useful would be more costly to spec.

There are two places to consider in the algebra for sub-selects: adding sub-selects to the graph pattern matching is a matter of allowing a SELECT where a graph pattern can appear; sub-selects in expression positions requires some additional conditions (a single projected variable). It would fit naturally into an expanded model of query processing that including calculating values but, like assignment and select-expressions/sub-selects, we can choose to unify the concepts and say it is a syntactic rewrite to a sub-select in the graph pattern.  A technical point: the distinction between sequences and multi-sets needs to be removed or the reverse of operation of sequence to multi-set needs adding. 

 Andy

[F1] http://www.w3.org/2009/sparql/wiki/Feature:ProjectExpressions

[F2] http://www.w3.org/2009/sparql/wiki/Feature:AggregateFunctions

[F3] http://www.w3.org/2009/sparql/wiki/Feature:Assignment


--------------------------------------------
  Hewlett-Packard Limited
  Registered Office: Cain Road, Bracknell, Berks RG12 1HN
  Registered No: 690597 England

Received on Wednesday, 11 March 2009 11:12:32 UTC