- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 12 Jun 2006 22:35:20 +0200
- To: Steve Harris <steve.harris@garlik.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
On Thu, Jun 08, 2006 at 11:12:40AM +0100, Steve Harris wrote: > > In the telecon yesterday I mentioned that allowing shared variables > introduced in OPTIONAL (and UNION) blocks prevented you from > implementing it in a conventional relational algebra engine. Example: > > :a dc10:title "thing" . > :a dc10:creator "John Smith" . > :a dc:11:creator _:jb . > _:jb rdf:value "Joe Bloggs" . > > SELECT ?name > WHERE { > ?a dc10:title "thing" . > OPTIONAL { ?a dc10:creator ?name } > OPTIONAL { ?a dc11:creator ?c . > ?c rdf:value ?name } > } This evaluation order of consecutive OPTIONALs is ordered by 5.1: [[ The OPTIONAL keyword is left-associative ]] so any bindings for ?name in the first OPTIONAL constrain the second. Thus, it's equivilent to two consecutive left outer joins and yeilds what you would expect from SQL (adding a SELECT) (sorry, not swapped in on relational algebra notation): SELECT T1.obj AS "?name", S0.var_c AS "?c" FROM T as T0 LEFT OUTER JOIN T as T1 ON T1.subj=T0.subj AND T1.pred="dc10:creator" LEFT OUTER JOIN ( SELECT T2.subj AS var_a, T2.obj AS var_c, T3.obj AS var_name FROM T as T2 JOIN T as T3 ON T3.subj=T2.obj AND T3.pred="rdf:value" WHERE T2.pred="dc11:creator" ) AS S0 ON S0.var_a=T0.subj AND T1.obj IS NULL OR S0.var_name=T1.obj WHERE T0.pred = "dc10:title" AND T0.obj = "thing" | ?name | ?c | +--------------+-------+ | "John Smith" | | However, if you simply insert another triple (pattern) in the mix:, :a dc10:creator "John Smith" . + :a cc:license "steal this book" . :a dc:11:creator _:jb . OPTIONAL { ?a dc10:creator ?name } + ?a cc:license ?license . OPTIONAL { ?a dc11:creator ?c . the text in 4.3 becomes relevent: [[ There is no implied order of graph patterns within a Group Graph Pattern. Any solution for the group graph pattern that can satisfy all the graph patterns in the group is valid, independently of the order that may be implied by the lexical order of the graph patterns in the group. ]] (In fact, I see 5.1 as contradicting this, but it comes later in the spec and is pretty clearly a more specific case.) Now this parsers as two patterns with OPTIONALs in them. "Any solution ... satisfy all the graph patterns" means that we have to include everything where OPTIONAL { ?a dc10:creator ?name } constrains OPTIONAL { ?a dc11:creator ?c . ?c rdf:value ?name } and visa versa:. | ?name | ?c | +--------------+-------+ | "John Smith" | | | "Joe Bloggs" | _:jb1 | This can be replicated in SQL with a union of the queries with the two left outer joins every possible order (two orders): SELECT S3.var_name AS "?name", S3.var_c AS "?c" FROM ( SELECT S1.var_c AS var_c, T1.obj AS var_name FROM T as T0 JOIN T as T7 ON T7.subj=T0.subj AND T7.pred="cc:license" LEFT OUTER JOIN T as T1 ON T1.subj=T0.subj AND T1.pred="dc10:creator" LEFT OUTER JOIN ( SELECT T2.subj AS var_a, T2.obj AS var_c, T3.obj AS var_name FROM T as T2 JOIN T as T3 ON T3.subj=T2.obj AND T3.pred="rdf:value" WHERE T2.pred="dc11:creator" ) AS S1 ON S1.var_a=T1.subj AND S1.var_name=T1.obj WHERE T0.pred = "dc10:title" AND T0.obj = "thing" UNION SELECT S2.var_c AS var_c, S2.var_name AS var_name FROM T as T0 JOIN T as T7 ON T7.subj=T0.subj AND T7.pred="cc:license" LEFT OUTER JOIN ( SELECT T5.subj AS var_a, T5.obj AS var_c, T6.obj AS var_name FROM T as T5 JOIN T as T6 ON T6.subj=T5.obj AND T6.pred="rdf:value" WHERE T5.pred="dc11:creator" ) AS S2 ON S2.var_a=T0.subj LEFT OUTER JOIN T as T4 ON S2.var_a=T4.subj AND T4.pred="dc10:creator" AND T4.obj=S2.var_name WHERE T0.pred = "dc10:title" AND T0.obj = "thing" ) AS S3 (I tried to factor out T0 and T7, but was unable to make the FROM be optional without making the whole UNION be optional and widening the results set and making it a bit complicated.) This case isn't so bad because there are only two OPTIONALs that might bind ?name. If there were three, the UNION would include all 3!=6 possible orderings. In general, one must execute all the possible orderings of all the OPTIONAL patterns that introduce any of the same variables. ALTERNATIVES: The above SQL is an interpretation by the current spec. I see intuitive interpreations of the query: DISJOINT: the old definition for the spec said that ?name could not show up in both of those OPTIONALs unless it was already bound by a non-optional (and, i guess, only by a UNION if it was bound in all sides of the UNION). Thus, the variables bound by any OPTIONAL patterns were disjoint. This isn't too rough to detect, and preserves blind re-ordering of the graph pattern. LEXICAL: SQL OUTER JOINs are defined by the LEXICAL order order. Query engines or proxy medlers are free to re-order so long as they are clever enough to know that the resulting order will give equivilent results. COMPREHENSIVE: The current spec gives the most complete answer set but is hard to implement and can be exponentially (factorially, actually) more expensive to execute. Also, the current spec has the odd effect that inserting that triple pattern in our earlier query drastically changed the semantics. > if there are no shared variables you can map this to: > (excuse syntax, ASCII RA is difficult and it's been a while) > > 1) T SELECT pred=dc10:title AND obj=thing RENAME subj,a > 2) LJOIN T a=subj RENAME obj,name > 3) LJOIN T a=subj RENAME obj,c > 4) LJOIN T c=subj AND obj=name > > where T is the table of triples, but if you do that you get: > > 1) a c name > :a NULL NULL > > 2) a c name > :a NULL "John Smith" > > 3) a c name > :a NULL "John Smith" > :a :_jb NULL > > 4) a c name > :a NULL "John Smith" > # obj = "John Smith" and obj = NULL both fail > > What you have to do (as Souri? pointed out) is namespace the values > according to where they're bound, which is tricky, steps outside > relational algebra and makes SPARQL significantly harder to implement > that necessary. > > In this simple example it is probably possible to implement it with > sub-projections or variable mapping and using a big RENAME on the > end, but in the general case, it's not. > > My impression is that this scoped shared variable thing is mostly a > hack to get round the fact that we don't have SQLs COALESCE() and > SELECT expressions, but I think it's a bad choice to make the core > language much more complex to get round it. > > The previous form of: > > SELECT ?name1 ?name2 > WHERE { > ?a dc10:title "thing" . > OPTIONAL { ?a dc10:creator ?name1 } > OPTIONAL { ?a dc11:creator ?c . > ?c rdf:value ?name2 } > } > > didn't have these problems of course, and as a bonus cures the order > dependency + exponential complexity problem Eric described. The only > downside being that the end user had to do the coalescing themselves. > > - Steve -- -eric home-office: +1.617.395.1213 (usually 900-2300 CET) cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Monday, 12 June 2006 20:44:33 UTC