- From: Jorge Pérez <jperez@utalca.cl>
- Date: Thu, 13 Apr 2006 13:20:36 -0400
- To: andy.seaborne@hp.com
- Cc: public-rdf-dawg-comments@w3.org
Hi! > > But when there is a nested optional pattern it appears to be a confusion, > > for: > > > > { > > A . > > OPTIONAL { > > B . > > OPTIONAL { C } > > } > > } > > > > What is the right answer? I think that there are at least two > > posibilities: > > > > 1) one is to think in the composition of two binary operations, a > > binary operation between B and C, and a binary operation between > > A and the result of the previous operation, something like > > OPT(A, OPT(B, C)). The formal defintion in this case is clear > > followind the doc: S is a solution of OPT(A, OPT(B,C)) if S is a > > solution of A and of OPT(B, C) otherwise if S is a solution to A > > but not to A and OPT(B, C). > > Yes - it's opt(A, opt(B,C)). > > and A OPTIONAL B OPTIONAL C is opt(opt(A,B), C) > > > > > 2) the other posibility is to think of the application of one single > > ternary operation, i.e. somethign like OPT(A, B, C). The formal > > definition here would be: S is a solution of OPT(A, B, C) if S is a > > solution of A and B and C, otherwise if S is a solution of A and B but > > not of C, otherwise if S is a solution to A but not to B. > > Surely OPT(A, B, C) == OPT(A, OPT(B, C)) by that definition anyway? No, it is not. It is exactly the problematic case. I include a counter example bellow > > > > > In the last draft it seems that when defining formally the optional > > matching the editors agree with 1) for nesting when thay say > > "a combination of two patterns...". But when they are informally > > reading the examples of nested optional patterns I think they agree > > with 2). When looking in some implementations (SPARQLer for example) > > they seems to use 2. for evaluation... > > SPARQLer (or rather ARQ, for that is the query engine being used) is doing > (1) > almost exactly: ARQ builds soltuion up, rather than consider all > possibilities > and narrow them down. > > The in-memory algorithm is roughly: > > 1/ Calculate A to get an iterator of solutions to A > > 2/ For each solution to A, > attempt to do OPTIONAL(Z) > where Z is B OPTIONAL C > either output the results of that or the original input solution from > A. > > 3/ In doing Z, it's the same, take the input solution, solve B, > attempt to solve C given the solution of A and B > Output either the solution from A fed into B fed into C > or just A fed into B > This algorithm gives a different result to doing somethig like OPT(A, OPT(B, C)) as a composition of binary operations. The above algorithm is doing exactly what I've named a ternary operation OPT(A, B, C): first get a solution for A, then try to solve B (for each solutions of A) and then try to solve C (for each solution of B). Note that the last solutions of C are heavily involved with the solutions of A, and then the operation OPT is not binary between B and C. Doing something like OPT(A, OPT(B, C)) would result in the following algorithm: first solve B and then try to solve C (for each solution of B), now solve A and find the one that are "compatibles" with the previous solutions (the solutions for A and for OPT(B,C) or for A and not for OPT(B,C)). A simple example shows that there are cases in which the two process leads to different results: Consider the data ++++++++++++++++++++++++++++++++++++++++++++++++++++++ @prefix : <http://ing.utalca.cl/~jperez/research/rdf/> . _:b1 :name "paul" . _:b1 :phone "777-3426" . _:b2 :name "john" . _:b2 :email "john@acd.edu" . _:b3 :name "george" . _:b3 :webPage "www.george.edu" . _:b4 :name "ringo" . _:b4 :email "ringo@acd.edu" . _:b4 :webPage "www.starr.edu" . _:b4 :phone "888-4537" . +++++++++++++++++++++++++++++++++++++++++++++++++++++++ and the query PREFIX : <http://ing.utalca.cl/~jperez/research/rdf/> SELECT * FROM :sample1.rdf WHERE { ?A :email ?E OPTIONAL { ?A :name ?N OPTIONAL { ?A :webPage ?E } } } Here A is ?A :email ?E, B is ?A :name ?N, and C is ?A :webPage ?E. Note the using of the same variable in A and C this is intentionally to cause a conflict. The following result is obtained in SPARQLer (and other implementations) +-------+-----------------+---------+ + A | E | N | +-------+-----------------+---------+ + _:b4 | "ringo@acd.edu" | "ringo" | +-------+-----------------+---------+ + _:b2 | "john@acd.edu" | "john" | +-----------------------------------+ It is a natural answer (at least te one that I would spect), and correspond to the algorithm that you mention (in my words a ternary operation). But if the process is someting like OPT(A, OPT(B, C)), i.e. using OPT as a binary operation then the result is +-------+-----------------+---------+ + A | E | N | +-------+-----------------+---------+ + _:b4 | "ringo@acd.edu" | | +-------+-----------------+---------+ + _:b2 | "john@acd.edu" | "john" | +-----------------------------------+ because when evaluating the most internal optional OPT(B,C) ?A :name ?N OPTIONAL { ?A :webPage ?E } in the case of the first mapping ?E is bounded to "www.starr.edu", ?A bounded to "_:b4", and ?N bounded to "ringo". Then when evaluating the external optional for A (in the case of the first mapping), ?A is bounded to "_:b4" and ?E to "ringo@acd.edu". Now we have to make OPT(A, OPT(B, C)) resulting in the mapping that bound ?A to _:b4 and ?E to "ringo@acd.edu" and letting ?N unbounded. I think that the ambiguity problem of nested optionals remains and is an important issue. For one side, the evaluation process that you mention is incompatible with binary operations, and for the other side the last release states clearly that the OPT operator is binary.... What is the right formal definition then? In other context, I've found other problematic queries that gives strange results in SPARQLer, like changing the order of subqueries in group graph patterns leading to different results (!!). Is SPARQLer a confiable implementation? Thaks for the time. -------- Jorge Pérez R. http://ing.utalca.cl/~jperez ------------------------------------------------- Este mensaje fue enviado por: http://webmail.utalca.cl
Received on Thursday, 13 April 2006 17:27:02 UTC