Nested CONSTRUCTs and Blank node minting from Axel Polleres on 2010-03-12 (public-rdf-dawg@w3.org from January to March 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Fri, 12 Mar 2010 20:13:54 +0000
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <6EEEE84F-D94A-4E01-A0B0-89A48D481C45@deri.org>
Hi, 

the mail exchange with Andy brought me back to that I wanted to inquire about the uses for nested CONSTRUCTs
before we close this issue. So, this will a bit lengthy... whoever is afraid to get bored, just look at the 
end of the mail where you find a summary ;-)

There are two basic uses I stumbled over, and which also came out in the discussions which I had on this 
issue with Claudio, Renzo (the group from Chile).

1) Modularity
2) Blank nodes per-result-set

The recent mails with Andy were mainly on 2) . Let me elaborate both use cases again, how they'd be 
done with nested CONSTRUCTs and what would be alternatives in our current design.


1) Modularity:
==============

Example1: 
---------
merge information from different graphs using different vocabularies

  Given three graphs which have information of documents and their authors

    G1:  :Doc ex1:author "John Doe", "Sean Dough".
         :Doc2 ex1:author "John Doe", "Joan Doe".
      
         ...

    G2:  :A1 foaf:made :D1; :D2; :D3.
         :A1 foaf:name "Joan Doe"
         :A2 foaf:made ;D3.
         :A2 foaf:name "Sean Dough"
         ...
  
    G3:  :doc3 dc:creator [foaf:name "Jane D."]


 which we want to aggregate using CONSTRUCT to dc:creator - foaf:knows. 


OPTION1: that could look as follows using nested CONSTRUCTs

   CONSTRUCT WHERE { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
   FROM  
     { CONSTRUCT {?Doc dc:creator [ foaf:name ?Name ] } FROM <G1>
       WHERE { ?Doc ex:author ?Name }  
   FROM  
     { CONSTRUCT {?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } FROM <G2>
       WHERE { ?Auth foaf:made ?Doc; foaf:name ?Name }  
   FROM  
     { CONSTRUCT WHERE {?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } FROM <G3> }  

  
OPTION2: Using a "normal" CONSTRUCT, this could still be done, but would be potentially harder/more expensive to evaluate, 
since one would need UNION queries, the evaluation of which might be more difficult to modularise:

   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
   FROM <G1>
   FROM <G2>
   FROM <G3>
   WHERE {
     { ?Doc ex:author ?Name } 
     UNION
     { ?Auth foaf:made ?Doc; foaf:name ?Name } 
     UNION    
     { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } 
   }  

OPTION3:  A version using named graphs might be better suited:

   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
   FROM NAMED <G1>
   FROM NAMED <G2>
   FROM NAMED <G3>
   WHERE {
     { GRAPH <G1> {?Doc ex:author ?Name } }
     UNION
     { GRAPH <G2> {?Auth foaf:made ?Doc; foaf:name ?Name } }
     UNION    
     { GRAPH <G3> { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } }
   }  

OPTION4: Subselects (need blank node construction and FROM in sub-SELECTs !!!!) seem work nicely here...

   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
   WHERE {
     { SELECT ?Doc, [] as ?Auth, ?Name FROM <G1> 
       WHERE { ?Doc ex:author ?Name } }
     UNION
     { SELECT ?Doc ?Auth ?Name FROM <G2> 
     { ?Auth foaf:made ?Doc; foaf:name ?Name } }
     UNION    
     { SELECT ?Doc ?Auth ?Name FROM <G3> 
       WHERE { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } } 
   } 

Example 2:
----------
(from Claudio): This is a bit theoretical, but should make the point, this is about deeply nested
CONSTRUCTs.

Claudio Gutierrez wrote:
> The query is:
> "Give me the neighbors at distance N of the resource xo. (That is, the
> nodes for which thre is a path (x0, _, x1) (x1,-,x2)....(xk, -, x_{N+1})
> 
> Here is the version with CONSTRUCT:
> 
> Let Q0:
> 
> (CONSTRUCT (x_0 ?y ?z)
> FROM G
> WHERE  (x0 ?y ?z)
> 
> and let Q_{k+1}:
> 
> (CONSTRUCT (?x ?y ?z)
> FROM Q_k
> FROM NAMED 1 G
> WHERE  (?a ?b  ?x) AND (1 GRAPH (?x ?y ? z))
> 
> The desired query is Q_N.
> 
> (Note: Without using the nested CONSTRUCT I am pretty sure there is *no*
> way of outputing a table of exponential size. As example consider the
> graph G:
> 
>   G = { (a0, p1, b0), (a0, p2, b0), ....., (aN, p1, bN), (aN, p2, bN),}
> 
> and the neighborhood of size N of a0.
> 
> BUT, with CONSTRUCT the query is linear in N and the size of G.
> 
> best,
> 
> claudio

 

This query though is probably also writeable by SUBSELECTs? I think this:

 SELECT ?Xn
   { ...
    { SELECT DISTINCT ?X2
       WHERE { { SELECT DISTINCT ?X1 
                 WHERE {x0 ?Y1 ?X1} }
               ?X1 ?Y1 ?X2 } }      
     ... }
   {?X_{n-1} ?Y_{n-1} Xn}
  }

would do the job. I am not 100% sure whether that works... opinions?

2) Blank nodes per result set. 
==============================

This general problem occurs when you want for instance to mint blank nodes for certain 
   literal values, which you want to preserve. 

Example 3:
----------
 
Take the graph G1 from above, where you want to output foaf:knows relations between joint authors of documents, 
Since you don't have identifiers for the persons, but only names, you want to use blank nodes, but you want to 
preserve the knowledge that in G1 author names are unique, that is, you want co-reference between those blank nodes.
          
The naive version

   CONSTRUCT { _:a foaf:knows _:b . _:a foaf:name ?n1 . _:b foaf:name ?n2 . } FROM <G1> WHERE { ?D ex1:author ?n1, ?n2 . FILTER ( ?n1 != ?n2 ) }

   does not preserve blank-node correlation, whereas this version does:

   CONSTRUCT { ?a knows ?b . ?a foaf:name ?aname . ?b foaf:name ?bname . } 
   FROM { CONSTRUCT { _:auth foaf:name ?n . ?p aux:hasAuthor _:auth . }
          FROM <G1> WHERE { ?p ex1:author ?n . } } 
   WHERE { ?p aux:hasAuthor ?a . ?a foaf:name ?aName .
           ?p aux:hasAuthor ?b . ?b foaf:name ?bName . 
           FILTER ( ?a != ?b ) }

   The inner CONSTRUCT mints a unique Bnode per name, which is then used in the outer CONSTRUCT.

Let's now see how that would look like with SUBSELECT .

OPTION 1: SELECT []

   CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?aname . ?b foaf:name ?bname . } 
   FROM <G1>
   WHERE 
    { { SELECT DISTINCT [] AS ?Auth1 , ?N1 WHERE { ?p dc:creator ?N1 . } } 
      { SELECT DISTINCT [] AS ?Auth2 , ?N2 WHERE { ?p dc:creator ?N2 . } }
      ?P ex1:author ?N1,? N2
      FILTER( ?N1 = ?N2 )
   } 

I wouldn't know how to do this with only one subSELECT. Not very appealing.

OPTION 2: BNODE()

   CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?N1 . ?b foaf:name ?N2 . } 
   FROM <G1>
   WHERE 
    { { SELECT DISTINCT BNODE() AS ?Auth1 , ?N1 WHERE { ?p dc:creator ?N1 . } } 
      { SELECT DISTINCT BNODE() AS ?Auth2 , ?N2 WHERE { ?p dc:creator ?N2 . } }
      ?P ex1:author ?N1,? N2
      FILTER( ?N1 = ?N2 )
   } 
      
Same as Option2, but with a function instead of [], as proposed by Andy.

OPTION 3: with per-result-set BNODE("string") this would be doable even without a SUBSELECT:

   CONSTRUCT { ?Auth1 knows ?Auth2 . ?Auth1 foaf:name ?N1 . ?Auth2 foaf:name ?N2 . } 
   WHERE
   { SELECT BNODE(?N1) AS ?Auth1 , BNODE(?N2) AS ?Auth2, ?N1, ?N2
     WHERE { ex1:author ?N1,? N2  FILTER( ?N1 = ?N2 ) }
   } 

This actually speaks a lot for the per-result-set BNODE() function ....


Bottom-line: 
   + BNODE(String) per-result-set semantics, 
   + BNODE() per-row semantics (or []) 
   + subSELECTs with FROM clauses allowed 
seem to address all my use case and also Claudio's use case,
(although nested CONSTRUCTs seem easier to write at times...)

if someone could confirm my subselect rewriting sketch for Example 2,
could we go with that?

best,
Axel
Received on Friday, 12 March 2010 20:14:29 UTC