Re: Nested CONSTRUCTs and Blank node minting from Steve Harris on 2010-03-13 (public-rdf-dawg@w3.org from January to March 2010)

From: Steve Harris <steve.harris@garlik.com>
Date: Sat, 13 Mar 2010 20:19:36 +0000
To: Axel Polleres <axel.polleres@deri.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <11DEE8AE-ACAE-4570-8AF3-1D736F2ABA50@garlik.com>
Re. example 1.

For at least my systems option 1 would be significantly less  
efficient, and hard to spot potential optimisations, whereas options 2  
and 3 are easy to optimise.

To supported to general case of option 1 I'd have to create an in  
memory graph, index it, and run queries against that.

Option 3 can actually be written as:

>   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>   WHERE {
>     { GRAPH <G1> {?Doc ex:author ?Name } }
>     UNION
>     { GRAPH <G2> {?Auth foaf:made ?Doc; foaf:name ?Name } }
>     UNION
>     { GRAPH <G3> { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } }
>   }


you don't need the FROM NAMED, the GRAPHs make it unnecessary. FWIW,  
we do this kind of query a lot in foaf.qdos.com.

Re. example 2

I don't understand why the use of CONSTRUCT is significant here  
either. If the example used something closer to SPARQL syntax, and/or  
was grounded in real data it might help.

Re. example 3

Agree on the utility of per-resultset BNODE() and BNODE(value), but I  
don't see where FROM inside sub-SELECT is relevant. It doesn't seem to  
be required in this example.

[ also, seems I'm not the only person that still habitually puts  
commas in SELECT statements :) ]

- Steve

On 12 Mar 2010, at 20:13, Axel Polleres wrote:

> Hi,
>
> the mail exchange with Andy brought me back to that I wanted to  
> inquire about the uses for nested CONSTRUCTs
> before we close this issue. So, this will a bit lengthy... whoever  
> is afraid to get bored, just look at the
> end of the mail where you find a summary ;-)
>
> There are two basic uses I stumbled over, and which also came out in  
> the discussions which I had on this
> issue with Claudio, Renzo (the group from Chile).
>
> 1) Modularity
> 2) Blank nodes per-result-set
>
> The recent mails with Andy were mainly on 2) . Let me elaborate both  
> use cases again, how they'd be
> done with nested CONSTRUCTs and what would be alternatives in our  
> current design.
>
>
> 1) Modularity:
> ==============
>
> Example1:
> ---------
> merge information from different graphs using different vocabularies
>
>  Given three graphs which have information of documents and their  
> authors
>
>    G1:  :Doc ex1:author "John Doe", "Sean Dough".
>         :Doc2 ex1:author "John Doe", "Joan Doe".
>
>         ...
>
>    G2:  :A1 foaf:made :D1; :D2; :D3.
>         :A1 foaf:name "Joan Doe"
>         :A2 foaf:made ;D3.
>         :A2 foaf:name "Sean Dough"
>         ...
>
>    G3:  :doc3 dc:creator [foaf:name "Jane D."]
>
>
> which we want to aggregate using CONSTRUCT to dc:creator - foaf:knows.
>
>
> OPTION1: that could look as follows using nested CONSTRUCTs
>
>   CONSTRUCT WHERE { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>   FROM
>     { CONSTRUCT {?Doc dc:creator [ foaf:name ?Name ] } FROM <G1>
>       WHERE { ?Doc ex:author ?Name }
>   FROM
>     { CONSTRUCT {?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }  
> FROM <G2>
>       WHERE { ?Auth foaf:made ?Doc; foaf:name ?Name }
>   FROM
>     { CONSTRUCT WHERE {?Doc dc:creator ?Auth. ?Auth foaf:name ? 
> Name } FROM <G3> }
>
>
> OPTION2: Using a "normal" CONSTRUCT, this could still be done, but  
> would be potentially harder/more expensive to evaluate,
> since one would need UNION queries, the evaluation of which might be  
> more difficult to modularise:
>
>   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>   FROM <G1>
>   FROM <G2>
>   FROM <G3>
>   WHERE {
>     { ?Doc ex:author ?Name }
>     UNION
>     { ?Auth foaf:made ?Doc; foaf:name ?Name }
>     UNION
>     { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name }
>   }
>
> OPTION3:  A version using named graphs might be better suited:
>
>   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>   FROM NAMED <G1>
>   FROM NAMED <G2>
>   FROM NAMED <G3>
>   WHERE {
>     { GRAPH <G1> {?Doc ex:author ?Name } }
>     UNION
>     { GRAPH <G2> {?Auth foaf:made ?Doc; foaf:name ?Name } }
>     UNION
>     { GRAPH <G3> { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } }
>   }
>
> OPTION4: Subselects (need blank node construction and FROM in sub- 
> SELECTs !!!!) seem work nicely here...
>
>   CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>   WHERE {
>     { SELECT ?Doc, [] as ?Auth, ?Name FROM <G1>
>       WHERE { ?Doc ex:author ?Name } }
>     UNION
>     { SELECT ?Doc ?Auth ?Name FROM <G2>
>     { ?Auth foaf:made ?Doc; foaf:name ?Name } }
>     UNION
>     { SELECT ?Doc ?Auth ?Name FROM <G3>
>       WHERE { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } }
>   }
>
> Example 2:
> ----------
> (from Claudio): This is a bit theoretical, but should make the  
> point, this is about deeply nested
> CONSTRUCTs.
>
> Claudio Gutierrez wrote:
>> The query is:
>> "Give me the neighbors at distance N of the resource xo. (That is,  
>> the
>> nodes for which thre is a path (x0, _, x1) (x1,-,x2)....(xk, -, x_{N 
>> +1})
>>
>> Here is the version with CONSTRUCT:
>>
>> Let Q0:
>>
>> (CONSTRUCT (x_0 ?y ?z)
>> FROM G
>> WHERE  (x0 ?y ?z)
>>
>> and let Q_{k+1}:
>>
>> (CONSTRUCT (?x ?y ?z)
>> FROM Q_k
>> FROM NAMED 1 G
>> WHERE  (?a ?b  ?x) AND (1 GRAPH (?x ?y ? z))
>>
>> The desired query is Q_N.
>>
>> (Note: Without using the nested CONSTRUCT I am pretty sure there is  
>> *no*
>> way of outputing a table of exponential size. As example consider the
>> graph G:
>>
>>  G = { (a0, p1, b0), (a0, p2, b0), ....., (aN, p1, bN), (aN, p2,  
>> bN),}
>>
>> and the neighborhood of size N of a0.
>>
>> BUT, with CONSTRUCT the query is linear in N and the size of G.
>>
>> best,
>>
>> claudio
>
>
>
> This query though is probably also writeable by SUBSELECTs? I think  
> this:
>
> SELECT ?Xn
>   { ...
>    { SELECT DISTINCT ?X2
>       WHERE { { SELECT DISTINCT ?X1
>                 WHERE {x0 ?Y1 ?X1} }
>               ?X1 ?Y1 ?X2 } }
>     ... }
>   {?X_{n-1} ?Y_{n-1} Xn}
>  }
>
> would do the job. I am not 100% sure whether that works... opinions?
>
> 2) Blank nodes per result set.
> ==============================
>
> This general problem occurs when you want for instance to mint blank  
> nodes for certain
>   literal values, which you want to preserve.
>
> Example 3:
> ----------
>
> Take the graph G1 from above, where you want to output foaf:knows  
> relations between joint authors of documents,
> Since you don't have identifiers for the persons, but only names,  
> you want to use blank nodes, but you want to
> preserve the knowledge that in G1 author names are unique, that is,  
> you want co-reference between those blank nodes.
>
> The naive version
>
>   CONSTRUCT { _:a foaf:knows _:b . _:a foaf:name ?n1 . _:b  
> foaf:name ?n2 . } FROM <G1> WHERE { ?D ex1:author ?n1, ?n2 . FILTER  
> ( ?n1 != ?n2 ) }
>
>   does not preserve blank-node correlation, whereas this version does:
>
>   CONSTRUCT { ?a knows ?b . ?a foaf:name ?aname . ?b foaf:name ? 
> bname . }
>   FROM { CONSTRUCT { _:auth foaf:name ?n . ?p aux:hasAuthor _:auth . }
>          FROM <G1> WHERE { ?p ex1:author ?n . } }
>   WHERE { ?p aux:hasAuthor ?a . ?a foaf:name ?aName .
>           ?p aux:hasAuthor ?b . ?b foaf:name ?bName .
>           FILTER ( ?a != ?b ) }
>
>   The inner CONSTRUCT mints a unique Bnode per name, which is then  
> used in the outer CONSTRUCT.
>
> Let's now see how that would look like with SUBSELECT .
>
> OPTION 1: SELECT []
>
>   CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?aname . ?b  
> foaf:name ?bname . }
>   FROM <G1>
>   WHERE
>    { { SELECT DISTINCT [] AS ?Auth1 , ?N1 WHERE { ?p dc:creator ? 
> N1 . } }
>      { SELECT DISTINCT [] AS ?Auth2 , ?N2 WHERE { ?p dc:creator ? 
> N2 . } }
>      ?P ex1:author ?N1,? N2
>      FILTER( ?N1 = ?N2 )
>   }
>
> I wouldn't know how to do this with only one subSELECT. Not very  
> appealing.
>
> OPTION 2: BNODE()
>
>   CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?N1 . ?b foaf:name ? 
> N2 . }
>   FROM <G1>
>   WHERE
>    { { SELECT DISTINCT BNODE() AS ?Auth1 , ?N1 WHERE { ?p  
> dc:creator ?N1 . } }
>      { SELECT DISTINCT BNODE() AS ?Auth2 , ?N2 WHERE { ?p  
> dc:creator ?N2 . } }
>      ?P ex1:author ?N1,? N2
>      FILTER( ?N1 = ?N2 )
>   }
>
> Same as Option2, but with a function instead of [], as proposed by  
> Andy.
>
> OPTION 3: with per-result-set BNODE("string") this would be doable  
> even without a SUBSELECT:
>
>   CONSTRUCT { ?Auth1 knows ?Auth2 . ?Auth1 foaf:name ?N1 . ?Auth2  
> foaf:name ?N2 . }
>   WHERE
>   { SELECT BNODE(?N1) AS ?Auth1 , BNODE(?N2) AS ?Auth2, ?N1, ?N2
>     WHERE { ex1:author ?N1,? N2  FILTER( ?N1 = ?N2 ) }
>   }
>
> This actually speaks a lot for the per-result-set BNODE()  
> function ....
>
>
> Bottom-line:
>   + BNODE(String) per-result-set semantics,
>   + BNODE() per-row semantics (or [])
>   + subSELECTs with FROM clauses allowed
> seem to address all my use case and also Claudio's use case,
> (although nested CONSTRUCTs seem easier to write at times...)
>
> if someone could confirm my subselect rewriting sketch for Example 2,
> could we go with that?
>
> best,
> Axel

-- 
Steve Harris, Garlik Limited
2 Sheen Road, Richmond, TW9 1AE, UK
+44 20 8973 2465  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD
Received on Saturday, 13 March 2010 20:20:07 UTC