Re: Formal defintion of nested optionals

Jorge Pérez wrote:
> Hi!
> 
>>> But when there is a nested optional pattern it appears to be a confusion,
>>> for:
>>>
>>> {
>>>   A .
>>>   OPTIONAL {
>>>     B .
>>>     OPTIONAL { C }
>>>   }
>>> }
>>>
>>> What is the right answer? I think that there are at least two
>>> posibilities:
>>>
>>> 1) one is to think in the composition of two binary operations, a
>>> binary operation between B and C, and a binary operation between
>>> A and the result of the previous operation, something like
>>> OPT(A, OPT(B, C)). The formal defintion in this case is clear
>>> followind the doc: S is a solution of OPT(A, OPT(B,C)) if S is a
>>> solution of A and of OPT(B, C) otherwise if S is a solution to A
>>> but not to A and OPT(B, C).
>> Yes - it's opt(A, opt(B,C)).
>>
>> and   A OPTIONAL B OPTIONAL C is opt(opt(A,B), C)
>>
>>> 2) the other posibility is to think of the application of one single
>>> ternary operation, i.e. somethign like OPT(A, B, C). The formal
>>> definition here would be: S is a solution of OPT(A, B, C) if S is a
>>> solution of A and B and C, otherwise if S is a solution of A and B but
>>> not of C, otherwise if S is a solution to A but not to B.
>> Surely OPT(A, B, C) == OPT(A, OPT(B, C)) by that definition anyway?
> 
> No, it is not. It is exactly the problematic case.
> I include a counter example bellow
> 
>>> In the last draft it seems that when defining formally the optional
>>> matching the editors agree with 1) for nesting when thay say
>>> "a combination of two patterns...". But when they are informally
>>> reading the examples of nested optional patterns I think they agree
>>> with 2). When looking in some implementations (SPARQLer for example)
>>> they seems to use 2. for evaluation...
>> SPARQLer (or rather ARQ, for that is the query engine being used) is doing
>> (1) 
>> almost exactly: ARQ builds soltuion up, rather than consider all
>> possibilities 
>> and narrow them down.
>>
>> The in-memory algorithm is roughly:
>>
>> 1/ Calculate A to get an iterator of solutions to A
>>
>> 2/ For each solution to A,
>>     attempt to do OPTIONAL(Z)
>>     where Z is B OPTIONAL C
>>       either output the results of that or the original input solution from
>> A.
>>
>> 3/ In doing Z, it's the same, take the input solution, solve B,
>>     attempt to solve C given the solution of A and B
>>     Output either the solution from A fed into B fed into C
>>     or just A fed into B
>>
> 
> This algorithm gives a different result to doing 
> somethig like OPT(A, OPT(B, C)) as a composition of binary operations.

When you consider expressions like OPT(A, OPT(B, C)) you have to be careful 
because the definition of optional does not allow you to read that as a 
independent functional nesting.  If you take the definition of optional 
written as OPT(X,Y) then it considers whether the pattern "X and Y" can be 
matched.  On expanding out the expression OPT(A, OPT(B, C)) one finds that it 
does not allow the evaluation of OPT(B,C) independently of A.  That's how I 
translated your definition into OPT(A, OPT(B, C)).

As the written form OPT(X,Y) only occurs once in the doc, I'll reword that 
part to be clearer.

	Andy

> The above algorithm is doing exactly what I've named a ternary 
> operation OPT(A, B, C): first get a solution for A, then try 
> to solve B (for each solutions of A) and then try to solve C
> (for each solution of B). Note that the last solutions of C
> are heavily involved with the solutions of A, and then the 
> operation OPT is not binary between B and C.
> 
> Doing something like OPT(A, OPT(B, C)) would result in the 
> following algorithm: first solve B and then try to solve C 
> (for each solution of B), now solve A and find the one that
> are "compatibles" with the previous solutions (the solutions
> for A and for OPT(B,C) or for A and not for OPT(B,C)).
> 
> A simple example shows that there are cases in which the 
> two process leads to different results:
> 
> Consider the data
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> @prefix : <http://ing.utalca.cl/~jperez/research/rdf/> .
> 
> _:b1 :name    "paul" .
> _:b1 :phone   "777-3426" .
> 
> _:b2 :name    "john" .
> _:b2 :email   "john@acd.edu" .
> 
> _:b3 :name    "george" .
> _:b3 :webPage "www.george.edu" .
> 
> _:b4 :name    "ringo" .
> _:b4 :email   "ringo@acd.edu" .
> _:b4 :webPage "www.starr.edu" .
> _:b4 :phone   "888-4537" .
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> and the query
> 
> PREFIX : <http://ing.utalca.cl/~jperez/research/rdf/>
> 
> SELECT *
> FROM :sample1.rdf
> WHERE {
>   ?A :email ?E
>   OPTIONAL { 
>     ?A :name ?N 
>     OPTIONAL { ?A :webPage ?E }
>   }
> }
> 
> Here A is ?A :email ?E, B is ?A :name ?N, and
> C is ?A :webPage ?E. Note the using of the same
> variable in A and C this is intentionally to cause 
> a conflict. The following result is obtained in SPARQLer
> (and other implementations)
> 
> +-------+-----------------+---------+
> +  A    |       E         |    N    |
> +-------+-----------------+---------+
> + _:b4  | "ringo@acd.edu" | "ringo" |
> +-------+-----------------+---------+
> + _:b2  | "john@acd.edu"  | "john"  |
> +-----------------------------------+ 
> 
> It is a natural answer (at least te one that
> I would spect), and correspond to the algorithm that you 
> mention (in my words a ternary operation). 
> But if the process is someting like OPT(A, OPT(B, C)),
> i.e. using OPT as a binary operation then the result is
> 
> +-------+-----------------+---------+
> +  A    |       E         |    N    |
> +-------+-----------------+---------+
> + _:b4  | "ringo@acd.edu" |         |
> +-------+-----------------+---------+
> + _:b2  | "john@acd.edu"  | "john"  |
> +-----------------------------------+ 
> 
> because when evaluating the most internal optional OPT(B,C)
> 
>     ?A :name ?N 
>     OPTIONAL { ?A :webPage ?E }
> 
> in the case of the first mapping ?E is bounded to 
> "www.starr.edu", ?A bounded to "_:b4", and ?N bounded 
> to "ringo". Then when evaluating the external optional
> for A (in the case of the first mapping), ?A is bounded
> to "_:b4" and ?E to "ringo@acd.edu". Now we have to make
> OPT(A, OPT(B, C)) resulting in the mapping that bound
> ?A to _:b4 and ?E to "ringo@acd.edu" and letting ?N unbounded.
> 
> I think that the ambiguity problem of nested optionals remains
> and is an important issue. 
> For one side, the evaluation process that you mention
> is incompatible with binary operations, and for the other side
> the last release states clearly that the OPT operator is binary....
> What is the right formal definition then?
> 
> In other context, I've found other problematic queries that 
> gives strange results in SPARQLer, like changing the order
> of subqueries in group graph patterns leading to different
> results (!!). Is SPARQLer a confiable implementation?
> 
> Thaks for the time.
> 
> --------
> Jorge Pérez R.
> http://ing.utalca.cl/~jperez
> 
> 
> 
> -------------------------------------------------
> Este mensaje fue enviado por: http://webmail.utalca.cl

Received on Tuesday, 18 April 2006 09:23:23 UTC