- From: Steve Harris <steve.harris@garlik.com>
- Date: Sat, 13 Mar 2010 20:19:36 +0000
- To: Axel Polleres <axel.polleres@deri.org>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Re. example 1. For at least my systems option 1 would be significantly less efficient, and hard to spot potential optimisations, whereas options 2 and 3 are easy to optimise. To supported to general case of option 1 I'd have to create an in memory graph, index it, and run queries against that. Option 3 can actually be written as: > CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } > WHERE { > { GRAPH <G1> {?Doc ex:author ?Name } } > UNION > { GRAPH <G2> {?Auth foaf:made ?Doc; foaf:name ?Name } } > UNION > { GRAPH <G3> { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } } > } you don't need the FROM NAMED, the GRAPHs make it unnecessary. FWIW, we do this kind of query a lot in foaf.qdos.com. Re. example 2 I don't understand why the use of CONSTRUCT is significant here either. If the example used something closer to SPARQL syntax, and/or was grounded in real data it might help. Re. example 3 Agree on the utility of per-resultset BNODE() and BNODE(value), but I don't see where FROM inside sub-SELECT is relevant. It doesn't seem to be required in this example. [ also, seems I'm not the only person that still habitually puts commas in SELECT statements :) ] - Steve On 12 Mar 2010, at 20:13, Axel Polleres wrote: > Hi, > > the mail exchange with Andy brought me back to that I wanted to > inquire about the uses for nested CONSTRUCTs > before we close this issue. So, this will a bit lengthy... whoever > is afraid to get bored, just look at the > end of the mail where you find a summary ;-) > > There are two basic uses I stumbled over, and which also came out in > the discussions which I had on this > issue with Claudio, Renzo (the group from Chile). > > 1) Modularity > 2) Blank nodes per-result-set > > The recent mails with Andy were mainly on 2) . Let me elaborate both > use cases again, how they'd be > done with nested CONSTRUCTs and what would be alternatives in our > current design. > > > 1) Modularity: > ============== > > Example1: > --------- > merge information from different graphs using different vocabularies > > Given three graphs which have information of documents and their > authors > > G1: :Doc ex1:author "John Doe", "Sean Dough". > :Doc2 ex1:author "John Doe", "Joan Doe". > > ... > > G2: :A1 foaf:made :D1; :D2; :D3. > :A1 foaf:name "Joan Doe" > :A2 foaf:made ;D3. > :A2 foaf:name "Sean Dough" > ... > > G3: :doc3 dc:creator [foaf:name "Jane D."] > > > which we want to aggregate using CONSTRUCT to dc:creator - foaf:knows. > > > OPTION1: that could look as follows using nested CONSTRUCTs > > CONSTRUCT WHERE { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } > FROM > { CONSTRUCT {?Doc dc:creator [ foaf:name ?Name ] } FROM <G1> > WHERE { ?Doc ex:author ?Name } > FROM > { CONSTRUCT {?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } > FROM <G2> > WHERE { ?Auth foaf:made ?Doc; foaf:name ?Name } > FROM > { CONSTRUCT WHERE {?Doc dc:creator ?Auth. ?Auth foaf:name ? > Name } FROM <G3> } > > > OPTION2: Using a "normal" CONSTRUCT, this could still be done, but > would be potentially harder/more expensive to evaluate, > since one would need UNION queries, the evaluation of which might be > more difficult to modularise: > > CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } > FROM <G1> > FROM <G2> > FROM <G3> > WHERE { > { ?Doc ex:author ?Name } > UNION > { ?Auth foaf:made ?Doc; foaf:name ?Name } > UNION > { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } > } > > OPTION3: A version using named graphs might be better suited: > > CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } > FROM NAMED <G1> > FROM NAMED <G2> > FROM NAMED <G3> > WHERE { > { GRAPH <G1> {?Doc ex:author ?Name } } > UNION > { GRAPH <G2> {?Auth foaf:made ?Doc; foaf:name ?Name } } > UNION > { GRAPH <G3> { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } } > } > > OPTION4: Subselects (need blank node construction and FROM in sub- > SELECTs !!!!) seem work nicely here... > > CONSTRUCT { ?Doc dc:creator ?Auth . ?Auth foaf:name ?Name } > WHERE { > { SELECT ?Doc, [] as ?Auth, ?Name FROM <G1> > WHERE { ?Doc ex:author ?Name } } > UNION > { SELECT ?Doc ?Auth ?Name FROM <G2> > { ?Auth foaf:made ?Doc; foaf:name ?Name } } > UNION > { SELECT ?Doc ?Auth ?Name FROM <G3> > WHERE { ?Doc dc:creator ?Auth. ?Auth foaf:name ?Name } } > } > > Example 2: > ---------- > (from Claudio): This is a bit theoretical, but should make the > point, this is about deeply nested > CONSTRUCTs. > > Claudio Gutierrez wrote: >> The query is: >> "Give me the neighbors at distance N of the resource xo. (That is, >> the >> nodes for which thre is a path (x0, _, x1) (x1,-,x2)....(xk, -, x_{N >> +1}) >> >> Here is the version with CONSTRUCT: >> >> Let Q0: >> >> (CONSTRUCT (x_0 ?y ?z) >> FROM G >> WHERE (x0 ?y ?z) >> >> and let Q_{k+1}: >> >> (CONSTRUCT (?x ?y ?z) >> FROM Q_k >> FROM NAMED 1 G >> WHERE (?a ?b ?x) AND (1 GRAPH (?x ?y ? z)) >> >> The desired query is Q_N. >> >> (Note: Without using the nested CONSTRUCT I am pretty sure there is >> *no* >> way of outputing a table of exponential size. As example consider the >> graph G: >> >> G = { (a0, p1, b0), (a0, p2, b0), ....., (aN, p1, bN), (aN, p2, >> bN),} >> >> and the neighborhood of size N of a0. >> >> BUT, with CONSTRUCT the query is linear in N and the size of G. >> >> best, >> >> claudio > > > > This query though is probably also writeable by SUBSELECTs? I think > this: > > SELECT ?Xn > { ... > { SELECT DISTINCT ?X2 > WHERE { { SELECT DISTINCT ?X1 > WHERE {x0 ?Y1 ?X1} } > ?X1 ?Y1 ?X2 } } > ... } > {?X_{n-1} ?Y_{n-1} Xn} > } > > would do the job. I am not 100% sure whether that works... opinions? > > 2) Blank nodes per result set. > ============================== > > This general problem occurs when you want for instance to mint blank > nodes for certain > literal values, which you want to preserve. > > Example 3: > ---------- > > Take the graph G1 from above, where you want to output foaf:knows > relations between joint authors of documents, > Since you don't have identifiers for the persons, but only names, > you want to use blank nodes, but you want to > preserve the knowledge that in G1 author names are unique, that is, > you want co-reference between those blank nodes. > > The naive version > > CONSTRUCT { _:a foaf:knows _:b . _:a foaf:name ?n1 . _:b > foaf:name ?n2 . } FROM <G1> WHERE { ?D ex1:author ?n1, ?n2 . FILTER > ( ?n1 != ?n2 ) } > > does not preserve blank-node correlation, whereas this version does: > > CONSTRUCT { ?a knows ?b . ?a foaf:name ?aname . ?b foaf:name ? > bname . } > FROM { CONSTRUCT { _:auth foaf:name ?n . ?p aux:hasAuthor _:auth . } > FROM <G1> WHERE { ?p ex1:author ?n . } } > WHERE { ?p aux:hasAuthor ?a . ?a foaf:name ?aName . > ?p aux:hasAuthor ?b . ?b foaf:name ?bName . > FILTER ( ?a != ?b ) } > > The inner CONSTRUCT mints a unique Bnode per name, which is then > used in the outer CONSTRUCT. > > Let's now see how that would look like with SUBSELECT . > > OPTION 1: SELECT [] > > CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?aname . ?b > foaf:name ?bname . } > FROM <G1> > WHERE > { { SELECT DISTINCT [] AS ?Auth1 , ?N1 WHERE { ?p dc:creator ? > N1 . } } > { SELECT DISTINCT [] AS ?Auth2 , ?N2 WHERE { ?p dc:creator ? > N2 . } } > ?P ex1:author ?N1,? N2 > FILTER( ?N1 = ?N2 ) > } > > I wouldn't know how to do this with only one subSELECT. Not very > appealing. > > OPTION 2: BNODE() > > CONSTRUCT { ?Auth1 knows ?Auth2 . ?a foaf:name ?N1 . ?b foaf:name ? > N2 . } > FROM <G1> > WHERE > { { SELECT DISTINCT BNODE() AS ?Auth1 , ?N1 WHERE { ?p > dc:creator ?N1 . } } > { SELECT DISTINCT BNODE() AS ?Auth2 , ?N2 WHERE { ?p > dc:creator ?N2 . } } > ?P ex1:author ?N1,? N2 > FILTER( ?N1 = ?N2 ) > } > > Same as Option2, but with a function instead of [], as proposed by > Andy. > > OPTION 3: with per-result-set BNODE("string") this would be doable > even without a SUBSELECT: > > CONSTRUCT { ?Auth1 knows ?Auth2 . ?Auth1 foaf:name ?N1 . ?Auth2 > foaf:name ?N2 . } > WHERE > { SELECT BNODE(?N1) AS ?Auth1 , BNODE(?N2) AS ?Auth2, ?N1, ?N2 > WHERE { ex1:author ?N1,? N2 FILTER( ?N1 = ?N2 ) } > } > > This actually speaks a lot for the per-result-set BNODE() > function .... > > > Bottom-line: > + BNODE(String) per-result-set semantics, > + BNODE() per-row semantics (or []) > + subSELECTs with FROM clauses allowed > seem to address all my use case and also Claudio's use case, > (although nested CONSTRUCTs seem easier to write at times...) > > if someone could confirm my subselect rewriting sketch for Example 2, > could we go with that? > > best, > Axel -- Steve Harris, Garlik Limited 2 Sheen Road, Richmond, TW9 1AE, UK +44 20 8973 2465 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Saturday, 13 March 2010 20:20:07 UTC