Re: Questions about OPTIONAL

Geoff Chappell wrote:

<large snip>
data is:
--------
@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name       "Alice" .
_:b  foaf:name       "Bob" .
_:b  foaf:mbox       <mailto:bob@work.example> .
_:c  foaf:mbox       <mailto:noname@work.example.org> .
--------

> I just wanted to dig into this a bit more... maybe I'm making some invalid
> or debatable assumptions in my thinking that someone could point out,
> because I don't see an order dependency here when NULLs are used (which
> isn't to say they wouldn't exist in another case).
> 
> Here's the way I see it behaving with (non-overridable) NULL values
> (assuming evaluation order is the same as query order). 
> 
> Order 1:
> 
>  WHERE ( ?x foaf:name  ?name )
>          OPTIONAL ( ?x foaf:mbox ?mbox )
>          ( ?y foaf:mbox ?mbox )
>  
> 
> Result of first triple:

Just to be quite sure here - by "triple", you mean the triple pattern?  Not a 
triple from the data.  That reading seems consistent with your examples below.

> 
> x   name
> === =======
> _:a "Alice"
> _:b "Bob"

Agreed - I get:

------------------
| x    | name    |
==================
| _:b0 | "Alice" |
| _:b1 | "Bob"   |
------------------

> 
> Result of first and second triple:
> 
> x   name    mbox
> === ======= =========================
> _:a "Alice" NULL
> _:b "Bob"   <mailto:bob@work.example>

Agreed - I get:

----------------------------------------------
| x    | name    | mbox                      |
==============================================
| _:b0 | "Bob"   | <mailto:bob@work.example> |
| _:b1 | "Alice" |                           |
----------------------------------------------

> 
> 
> Result of all triples:
> 
> x   name    mbox                      y
> === ======= ========================= ===
> _:b "Bob"   <mailto:bob@work.example> _:b


This is the solution I get if I remove the optional:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE  ( ?x foaf:name ?name )
        ( ?x foaf:mbox ?mbox )
        ( ?y foaf:mbox ?mbox )

With OPTIONAL on the second triple pattern I get a different result (my bNode 
labels may jump about - they aren't from the original data, they get allocated 
as the solutions are printed):

------------------------------------------------------------
| x    | name    | mbox                             | y    |
============================================================
| _:b0 | "Alice" | <mailto:noname@work.example.org> | _:b1 |
| _:b0 | "Alice" | <mailto:bob@work.example>        | _:b2 |
| _:b2 | "Bob"   | <mailto:bob@work.example>        | _:b2 |
------------------------------------------------------------

This is because the partial solution [ ?x = _:b1 , ?name = "Alice" ]
from step 1 is still present at the start of step 3:

[ ?x = _:b1 , ?name = "Alice" ]
fed into the triple pattern:
   ( ?y foaf:mbox ?mbox )
gives 2 matches (the first two lines above).



Optional is, for each initial input solution:

if the optional pattern matches, with the initial input solution,
then
    create solutions
else
    pass through the initial input solution
    (that is, don't fail the current solution)

> 
> 
> Order 2:
> 
>  WHERE ( ?x foaf:name  ?name )
>        ( ?y foaf:mbox ?mbox )
>        OPTIONAL ( ?x foaf:mbox ?mbox ) 
> 
> Result of first triple:
> 
> x   name
> === =======
> _:a "Alice"
> _:b "Bob"

Agreed.

> 
> Result of first and second triple:
> 
> x   name    y   mbox
> === ======= === =========================
> _:a "Alice" _:b <mailto:bob@work.example>
> _:a "Alice" _:c <mailto:noname@work.example.org>
> _:b "Bob"   _:b <mailto:bob@work.example>
> _:b "Bob"   _:c <mailto:noname@work.example.org>

Agreed.



> 
> Result of all triples:
> 
> x   name    y   mbox
> === ======= === =========================
> _:b "Bob"   _:b <mailto:bob@work.example>

Optional can't reduce the number of input solutions, though it can increase them

Take just the first solution so far:
x   name    y   mbox
=== ======= === =========================
_:a "Alice" _:b <mailto:bob@work.example>

and the pattern:
OPTIONAL ( ?x foaf:mbox ?mbox )

There is no triple (ALice has no mbox)
_:a  foaf:mbox <mailto:bob@work.example>

so the optional will not fail this solution but it wouldn't add anything either. 
  It leaves it unchanged.

This is the effect is "I've found some things of interest, add in some extra 
information if its available, but don't worry if its not".

> 
> (this might be the case where folks see it differently. If I spelled this
> out in with intermediate join products, would it be clearer? As I see it,
> since all vars are bound coming into the optional, mbox can't be overridden
> with a NULL value. Perhaps others see it differently?)

That was very clear.  I seem to have found where we differ.

> 
> 
> Order 3:
> 
>  WHERE OPTIONAL ( ?x foaf:mbox ?mbox ) 
> 	( ?x foaf:name  ?name )
>       ( ?y foaf:mbox ?mbox )
> 
> (treatment of an initial optional is similar to the way you'd have to treat
> an initial negation - not surprising if you consider OPTIONAL A to be A or
> NOT A. So I'm assuming it turns into something like this under the hood:
> 
>  ( ?x foaf:mbox ?mbox ) or ((?x ?any ?mbox) and not ( ?x foaf:mbox ?mbox ))
> 
> i.e. introduce the full graph so the negation has something to go against.

Yes - starting off with all possible solutions.  Sets of bindings are 
restrictions on the space of all possibilities so passing in a solution of no 
bindings is teh seed I use.

> Happily any reasonable query optimizer would pick one of the other orders
> rather than doing this :)
> 
> Result of first triple:
> 
> x   mbox
> === ================================
> _:b <mailto:bob@work.example>
> _:c <mailto:noname@work.example.org>
> _:a "Alice"
> _:b "Bob"
> 
> Result of first and second triple:
> 
> x   mbox                             name
> === ================================ =====
> _:b <mailto:bob@work.example>        "Bob"
> _:a "Alice"                          "Alice"                           
> _:b "Bob"                            "Bob"
> 
> Result of all triples:
> 
> x   mbox                             name    y
> === ================================ ======= ===
> _:b <mailto:bob@work.example>        "Bob"   _:b
> 
> 
> I'll skip the other possible orders because they're not interesting (i.e.
> it's really only moving the optional around that changes things).

:-)

> 
> So that's what I'm thinking (and by the way what I've implemented). What are
> the points of contention with this approach?

<more snipping>
> 
> 
> Geoff
> 

	Andy

Received on Friday, 25 February 2005 11:55:35 UTC