Re: Nested CONSTRUCTs and Blank node minting from Andy Seaborne on 2010-03-14 (public-rdf-dawg@w3.org from January to March 2010)

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Sun, 14 Mar 2010 12:49:43 +0000
To: Axel Polleres <axel.polleres@deri.org>
CC: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4B9CDB67.3020000@talis.com>
 > Bottom-line:
 >     + BNODE(String) per-result-set semantics,
 >     + BNODE() per-row semantics (or [])
 >     + subSELECTs with FROM clauses allowed
 > seem to address all my use case and also Claudio's use case,
 > (although nested CONSTRUCTs seem easier to write at times...)

Agreed, assuming  BNODE() generates a fresh bnode per call.
i.e.
SELECT (BNODE() AS ?B1) (BNODE() AS ?B2) ...

and ?B1 != ?B2

A few matters with the examples noted below but I don't think they 
invalidate the argument.

Also note we have a proposal for parameters to aggregators, as well as 
group arguments, and using "[" and "]" seems sensible because that's in 
expressions and so obviously aren't related to bnodes.

	Andy

On 12/03/2010 8:13 PM, Axel Polleres wrote:
> Hi,
>
> the mail exchange with Andy brought me back to that I wanted to inquire about the uses for nested CONSTRUCTs
> before we close this issue. So, this will a bit lengthy... whoever is afraid to get bored, just look at the
> end of the mail where you find a summary ;-)
>
> There are two basic uses I stumbled over, and which also came out in the discussions which I had on this
> issue with Claudio, Renzo (the group from Chile).
>
> 1) Modularity
> 2) Blank nodes per-result-set
>
> The recent mails with Andy were mainly on 2) . Let me elaborate both use cases again, how they'd be
> done with nested CONSTRUCTs and what would be alternatives in our current design.
>
>
> 1) Modularity:
> ==============
>
> Example1:
> ---------
> merge information from different graphs using different vocabularies
>
>    Given three graphs which have information of documents and their authors
>
>      G1:  :Doc ex1:author "John Doe", "Sean Dough".
>           :Doc2 ex1:author "John Doe", "Joan Doe".
>
>           ...
>
>      G2:  :A1 foaf:made :D1; :D2; :D3.
>           :A1 foaf:name "Joan Doe"
>           :A2 foaf:made ;D3.
>           :A2 foaf:name "Sean Dough"
>           ...
>
>      G3:  :doc3 dc:creator [foaf:name "Jane D."]
>
>
>   which we want to aggregate using CONSTRUCT to dc:creator - foaf:knows.
>
>
> OPTION1: that could look as follows using nested CONSTRUCTs
>
>     CONSTRUCT WHERE { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>     FROM
>       { CONSTRUCT {?Doc dc:creator [ foaf:name ?Name ] } FROM<G1>
>         WHERE { ?Doc ex:author ?Name }
>     FROM
>       { CONSTRUCT {?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } FROM<G2>
>         WHERE { ?Auth foaf:made ?Doc; foaf:name ?Name }
>     FROM
>       { CONSTRUCT WHERE {?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } FROM<G3>  }
>
>
> OPTION2: Using a "normal" CONSTRUCT, this could still be done, but would be potentially harder/more expensive to evaluate,
> since one would need UNION queries, the evaluation of which might be more difficult to modularise:
>
>     CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>     FROM<G1>
>     FROM<G2>
>     FROM<G3>
>     WHERE {
>       { ?Doc ex:author ?Name }

This clause of the UNION does nothing in the CONSTRUCT because ?Auth is 
not bound.  The construct equivalent had [] to introduce the bnode.

>       UNION
>       { ?Auth foaf:made ?Doc; foaf:name ?Name }
>       UNION
>       { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name }
>     }
>
> OPTION3:  A version using named graphs might be better suited:
>
>     CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>     FROM NAMED<G1>
>     FROM NAMED<G2>
>     FROM NAMED<G3>
>     WHERE {
>       { GRAPH<G1>  {?Doc ex:author ?Name } }
>       UNION
>       { GRAPH<G2>  {?Auth foaf:made ?Doc; foaf:name ?Name } }
>       UNION
>       { GRAPH<G3>  { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } }
>     }
>
> OPTION4: Subselects (need blank node construction and FROM in sub-SELECTs !!!!) seem work nicely here...
>
>     CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name }
>     WHERE {
>       { SELECT ?Doc, [] as ?Auth, ?Name FROM<G1>
>         WHERE { ?Doc ex:author ?Name } }

Can I say "assignment" here? :-)

>       UNION
>       { SELECT ?Doc ?Auth ?Name FROM<G2>
>       { ?Auth foaf:made ?Doc; foaf:name ?Name } }
>       UNION
>       { SELECT ?Doc ?Auth ?Name FROM<G3>
>         WHERE { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } }
>     }
>
> Example 2:
> ----------
> (from Claudio): This is a bit theoretical, but should make the point, this is about deeply nested
> CONSTRUCTs.
>
> Claudio Gutierrez wrote:
>> The query is:
>> "Give me the neighbors at distance N of the resource xo. (That is, the
>> nodes for which thre is a path (x0, _, x1) (x1,-,x2)....(xk, -, x_{N+1})
>>
>> Here is the version with CONSTRUCT:
>>
>> Let Q0:
>>
>> (CONSTRUCT (x_0 ?y ?z)
>> FROM G
>> WHERE  (x0 ?y ?z)
>>
>> and let Q_{k+1}:
>>
>> (CONSTRUCT (?x ?y ?z)
>> FROM Q_k
>> FROM NAMED 1 G
>> WHERE  (?a ?b  ?x) AND (1 GRAPH (?x ?y ? z))
>>
>> The desired query is Q_N.
>>
>> (Note: Without using the nested CONSTRUCT I am pretty sure there is *no*
>> way of outputing a table of exponential size. As example consider the
>> graph G:
>>
>>    G = { (a0, p1, b0), (a0, p2, b0), ....., (aN, p1, bN), (aN, p2, bN),}
>>
>> and the neighborhood of size N of a0.
>>
>> BUT, with CONSTRUCT the query is linear in N and the size of G.
>>
>> best,
>>
>> claudio
>
>
>
> This query though is probably also writeable by SUBSELECTs? I think this:
>
>   SELECT ?Xn
>     { ...
>      { SELECT DISTINCT ?X2
>         WHERE { { SELECT DISTINCT ?X1
>                   WHERE {x0 ?Y1 ?X1} }
>                 ?X1 ?Y1 ?X2 } }
>       ... }
>     {?X_{n-1} ?Y_{n-1} Xn}
>    }
>
> would do the job. I am not 100% sure whether that works... opinions?


Maybe property paths are a better approach.

And like Steve I don't see the connection to CONSTRUCT.



>
> 2) Blank nodes per result set.
> ==============================
>
> This general problem occurs when you want for instance to mint blank nodes for certain
>     literal values, which you want to preserve.
>
> Example 3:
> ----------
>
> Take the graph G1 from above, where you want to output foaf:knows relations between joint authors of documents,
> Since you don't have identifiers for the persons, but only names, you want to use blank nodes, but you want to
> preserve the knowledge that in G1 author names are unique, that is, you want co-reference between those blank nodes.
>
> The naive version
>
>     CONSTRUCT { _:a foaf:knows _:b . _:a foaf:name ?n1 . _:b foaf:name ?n2 . } FROM<G1>  WHERE { ?D ex1:author ?n1, ?n2 . FILTER ( ?n1 != ?n2 ) }
>
>     does not preserve blank-node correlation, whereas this version does:
>
>     CONSTRUCT { ?a knows ?b . ?a foaf:name ?aname . ?b foaf:name ?bname . }
>     FROM { CONSTRUCT { _:auth foaf:name ?n . ?p aux:hasAuthor _:auth . }
>            FROM<G1>  WHERE { ?p ex1:author ?n . } }
>     WHERE { ?p aux:hasAuthor ?a . ?a foaf:name ?aName .
>             ?p aux:hasAuthor ?b . ?b foaf:name ?bName .
>             FILTER ( ?a != ?b ) }
>
>     The inner CONSTRUCT mints a unique Bnode per name, which is then used in the outer CONSTRUCT.
>
> Let's now see how that would look like with SUBSELECT .
>
> OPTION 1: SELECT []
>
>     CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?aname . ?b foaf:name ?bname . }
>     FROM<G1>
>     WHERE
>      { { SELECT DISTINCT [] AS ?Auth1 , ?N1 WHERE { ?p dc:creator ?N1 . } }

DISTINCT does nothing because a new term is introduced in each row so 
each row is unique.

>        { SELECT DISTINCT [] AS ?Auth2 , ?N2 WHERE { ?p dc:creator ?N2 . } }
>        ?P ex1:author ?N1,? N2
>        FILTER( ?N1 = ?N2 )

??
         FILTER( ?N1 != ?N2 )
from just looking at
         FILTER ( ?a != ?b )


>     }
>
> I wouldn't know how to do this with only one subSELECT. Not very appealing.
>
> OPTION 2: BNODE()
>
>     CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?N1 . ?b foaf:name ?N2 . }
>     FROM<G1>
>     WHERE
>      { { SELECT DISTINCT BNODE() AS ?Auth1 , ?N1 WHERE { ?p dc:creator ?N1 . } }
>        { SELECT DISTINCT BNODE() AS ?Auth2 , ?N2 WHERE { ?p dc:creator ?N2 . } }
>        ?P ex1:author ?N1,? N2
>        FILTER( ?N1 = ?N2 )
>     }
>
> Same as Option2, but with a function instead of [], as proposed by Andy.
>
> OPTION 3: with per-result-set BNODE("string") this would be doable even without a SUBSELECT:
>
>     CONSTRUCT { ?Auth1 knows ?Auth2 . ?Auth1 foaf:name ?N1 . ?Auth2 foaf:name ?N2 . }
>     WHERE
>     { SELECT BNODE(?N1) AS ?Auth1 , BNODE(?N2) AS ?Auth2, ?N1, ?N2
>       WHERE { ex1:author ?N1,? N2  FILTER( ?N1 = ?N2 ) }
>     }
>
> This actually speaks a lot for the per-result-set BNODE() function ....
>
>
> Bottom-line:
>     + BNODE(String) per-result-set semantics,
>     + BNODE() per-row semantics (or [])

that is unique on each call?

SELECT (BNODE() AS ?B1) (BNODE() AS ?B2) ...

and ?B1 != ?B2

>     + subSELECTs with FROM clauses allowed
> seem to address all my use case and also Claudio's use case,
> (although nested CONSTRUCTs seem easier to write at times...)
>
> if someone could confirm my subselect rewriting sketch for Example 2,
> could we go with that?
>
> best,
> Axel
Received on Sunday, 14 March 2010 12:50:42 UTC