W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > April 2006

Re: Formal defintion of nested optionals

From: Jorge Pérez <jperez@utalca.cl>
Date: Tue, 18 Apr 2006 12:17:37 -0400
Message-ID: <1145377057.44451121b950a@webmail.utalca.cl>
To: andy.seaborne@hp.com
Cc: public-rdf-dawg-comments@w3.org

Mensaje citado por "Seaborne, Andy" <andy.seaborne@hp.com>:

> 
> 
> Jorge Pérez wrote:
> > Hi!
> > 
> >>> But when there is a nested optional pattern it appears to be a
> confusion,
> >>> for:
> >>>
> >>> {
> >>>   A .
> >>>   OPTIONAL {
> >>>     B .
> >>>     OPTIONAL { C }
> >>>   }
> >>> }
> >>>
> >>> What is the right answer? I think that there are at least two
> >>> posibilities:
> >>>
> >>> 1) one is to think in the composition of two binary operations, a
> >>> binary operation between B and C, and a binary operation between
> >>> A and the result of the previous operation, something like
> >>> OPT(A, OPT(B, C)). The formal defintion in this case is clear
> >>> followind the doc: S is a solution of OPT(A, OPT(B,C)) if S is a
> >>> solution of A and of OPT(B, C) otherwise if S is a solution to A
> >>> but not to A and OPT(B, C).
> >> Yes - it's opt(A, opt(B,C)).
> >>
> >> and   A OPTIONAL B OPTIONAL C is opt(opt(A,B), C)
> >>
> >>> 2) the other posibility is to think of the application of one single
> >>> ternary operation, i.e. somethign like OPT(A, B, C). The formal
> >>> definition here would be: S is a solution of OPT(A, B, C) if S is a
> >>> solution of A and B and C, otherwise if S is a solution of A and B but
> >>> not of C, otherwise if S is a solution to A but not to B.
> >> Surely OPT(A, B, C) == OPT(A, OPT(B, C)) by that definition anyway?
> > 
> > No, it is not. It is exactly the problematic case.
> > I include a counter example bellow
> > 
> >>> In the last draft it seems that when defining formally the optional
> >>> matching the editors agree with 1) for nesting when thay say
> >>> "a combination of two patterns...". But when they are informally
> >>> reading the examples of nested optional patterns I think they agree
> >>> with 2). When looking in some implementations (SPARQLer for example)
> >>> they seems to use 2. for evaluation...
> >> SPARQLer (or rather ARQ, for that is the query engine being used) is
> doing
> >> (1) 
> >> almost exactly: ARQ builds soltuion up, rather than consider all
> >> possibilities 
> >> and narrow them down.
> >>
> >> The in-memory algorithm is roughly:
> >>
> >> 1/ Calculate A to get an iterator of solutions to A
> >>
> >> 2/ For each solution to A,
> >>     attempt to do OPTIONAL(Z)
> >>     where Z is B OPTIONAL C
> >>       either output the results of that or the original input solution
> from
> >> A.
> >>
> >> 3/ In doing Z, it's the same, take the input solution, solve B,
> >>     attempt to solve C given the solution of A and B
> >>     Output either the solution from A fed into B fed into C
> >>     or just A fed into B
> >>
> > 
> > This algorithm gives a different result to doing 
> > somethig like OPT(A, OPT(B, C)) as a composition of binary operations.
> 
> When you consider expressions like OPT(A, OPT(B, C)) you have to be careful 
> because the definition of optional does not allow you to read that as a 
> independent functional nesting.  

This is exactly the point: From my point of view, if the evaluation occurs in 
the way you mention, the definition is wrong when it says "a combination of 
TWO patterns..." because it works only when there is no nesting.
Then, a nested optional pattern is "a combination of THREE patterns..." a 
nested nested optional pattern is "a combination of FOUR patterns..." etc.

> If you take the definition of optional 
> written as OPT(X,Y) then it considers whether the pattern "X and Y" can be 
> matched.  On expanding out the expression OPT(A, OPT(B, C)) one finds that it
> does not allow the evaluation of OPT(B,C) independently of A.  That's how I 
> translated your definition into OPT(A, OPT(B, C)).

My point is that if a nested optional must be evaluated in the way that you 
mention, it would never be represented as OPT(A, OPT(B, C)) that has a 
different (an very precise) meaning in any algebra of operators, it must be 
represented as something like OPT(A, B, C), and one would never speak of 
optionals as binary operators when there is nesting, do you agree?

> 
> As the written form OPT(X,Y) only occurs once in the doc, I'll reword that 
> part to be clearer.

That would be a great start, but the problem is not just the written form OPT
(X,Y). In the formal definition the phrase "if S is a pattern solution of A 
and of B otherwise if S is a solution to A, but not to A and B" only works if 
B has no optionals, so it must be reworded or stated as a special case of non 
nesting, separating the nested cases. I also think that the phrase "a 
combination of TWO patterns..." is a good phrase in an informal description 
(as an idea), but not in a formal one because it do not show the complexity of 
nesting optionals.

Thanks again for your time, I will start with another thread to discuss about 
some *problems* (from my point of view) in the implementation of SPARQLer 
(ARQ) that reflects other ambiguities in the definitions of the last draft.



- Jorge


> 
> 	Andy
> 
> > The above algorithm is doing exactly what I've named a ternary 
> > operation OPT(A, B, C): first get a solution for A, then try 
> > to solve B (for each solutions of A) and then try to solve C
> > (for each solution of B). Note that the last solutions of C
> > are heavily involved with the solutions of A, and then the 
> > operation OPT is not binary between B and C.
> > 
> > Doing something like OPT(A, OPT(B, C)) would result in the 
> > following algorithm: first solve B and then try to solve C 
> > (for each solution of B), now solve A and find the one that
> > are "compatibles" with the previous solutions (the solutions
> > for A and for OPT(B,C) or for A and not for OPT(B,C)).
> > 
> > A simple example shows that there are cases in which the 
> > two process leads to different results:
> > 
> > Consider the data
> > 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > @prefix : <http://ing.utalca.cl/~jperez/research/rdf/> .
> > 
> > _:b1 :name    "paul" .
> > _:b1 :phone   "777-3426" .
> > 
> > _:b2 :name    "john" .
> > _:b2 :email   "john@acd.edu" .
> > 
> > _:b3 :name    "george" .
> > _:b3 :webPage "www.george.edu" .
> > 
> > _:b4 :name    "ringo" .
> > _:b4 :email   "ringo@acd.edu" .
> > _:b4 :webPage "www.starr.edu" .
> > _:b4 :phone   "888-4537" .
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 
> > and the query
> > 
> > PREFIX : <http://ing.utalca.cl/~jperez/research/rdf/>
> > 
> > SELECT *
> > FROM :sample1.rdf
> > WHERE {
> >   ?A :email ?E
> >   OPTIONAL { 
> >     ?A :name ?N 
> >     OPTIONAL { ?A :webPage ?E }
> >   }
> > }
> > 
> > Here A is ?A :email ?E, B is ?A :name ?N, and
> > C is ?A :webPage ?E. Note the using of the same
> > variable in A and C this is intentionally to cause 
> > a conflict. The following result is obtained in SPARQLer
> > (and other implementations)
> > 
> > +-------+-----------------+---------+
> > +  A    |       E         |    N    |
> > +-------+-----------------+---------+
> > + _:b4  | "ringo@acd.edu" | "ringo" |
> > +-------+-----------------+---------+
> > + _:b2  | "john@acd.edu"  | "john"  |
> > +-----------------------------------+ 
> > 
> > It is a natural answer (at least te one that
> > I would spect), and correspond to the algorithm that you 
> > mention (in my words a ternary operation). 
> > But if the process is someting like OPT(A, OPT(B, C)),
> > i.e. using OPT as a binary operation then the result is
> > 
> > +-------+-----------------+---------+
> > +  A    |       E         |    N    |
> > +-------+-----------------+---------+
> > + _:b4  | "ringo@acd.edu" |         |
> > +-------+-----------------+---------+
> > + _:b2  | "john@acd.edu"  | "john"  |
> > +-----------------------------------+ 
> > 
> > because when evaluating the most internal optional OPT(B,C)
> > 
> >     ?A :name ?N 
> >     OPTIONAL { ?A :webPage ?E }
> > 
> > in the case of the first mapping ?E is bounded to 
> > "www.starr.edu", ?A bounded to "_:b4", and ?N bounded 
> > to "ringo". Then when evaluating the external optional
> > for A (in the case of the first mapping), ?A is bounded
> > to "_:b4" and ?E to "ringo@acd.edu". Now we have to make
> > OPT(A, OPT(B, C)) resulting in the mapping that bound
> > ?A to _:b4 and ?E to "ringo@acd.edu" and letting ?N unbounded.
> > 
> > I think that the ambiguity problem of nested optionals remains
> > and is an important issue. 
> > For one side, the evaluation process that you mention
> > is incompatible with binary operations, and for the other side
> > the last release states clearly that the OPT operator is binary....
> > What is the right formal definition then?
> > 
> > In other context, I've found other problematic queries that 
> > gives strange results in SPARQLer, like changing the order
> > of subqueries in group graph patterns leading to different
> > results (!!). Is SPARQLer a confiable implementation?
> > 
> > Thaks for the time.
> > 
> > --------
> > Jorge Pérez R.
> > http://ing.utalca.cl/~jperez
> > 
> > 
> > 
> > -------------------------------------------------
> > Este mensaje fue enviado por: http://webmail.utalca.cl
> 
> 
> 




-------------------------------------------------
Este mensaje fue enviado por: http://webmail.utalca.cl
Received on Tuesday, 18 April 2006 16:22:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:50 GMT