Re: Questions about OPTIONAL

Geoff Chappell wrote:
> 
>>-----Original Message-----
>>From: Seaborne, Andy [mailto:andy.seaborne@hp.com]
>>Sent: Wednesday, February 23, 2005 10:00 AM
>>To: Geoff Chappell
>>Cc: public-rdf-dawg-comments@w3.org
>>Subject: Re: Questions about OPTIONAL
>>
>>Geoff Chappell wrote:
>>
>>>A few questions about OPTIONAL...
>>>
>>>1) If I had a query like this:
>>>
>>>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>>SELECT ?x ?name ?y
>>>WHERE  ( ?x foaf:name  ?name )
>>>       OPTIONAL ( ?x foaf:mbox ?mbox )
>>>       ( ?y foaf:mbox ?mbox )
>>>
>>>with data:
>>>
>>>@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
>>>
>>>_:a  foaf:name       "Alice" .
>>>_:b  foaf:name       "Bob" .
>>>_:b  foaf:mbox       <mailto:bob@work.example> .
>>>_:c  foaf:mbox       <mailto:noname@work.example.org> .
>>>
>>>
>>>should I get:
>>>
>>>x   name    y
>>>=== ======= ===
>>>_:b "Bob"   _:b
>>>
>>>or:
>>>
>>>x   name    y
>>>=== ======= ===
>>>_:a "Alice" _:b
>>>_:a "Alice" _:c
>>>_:b "Bob"   _:b
>>
>>Geoof - thank you very much for including a concrete example:
>>
>>The current working draft is inadequate in the treatment of order of
>>execution
>>implications - this is something that has to be done.
>>
>>
>>Executed purely in the order given, I tried your example and get:
>>
>>-------------------------
>>| x    | name    | y    |
>>=========================
>>| _:b0 | "Alice" | _:b1 |
>>| _:b0 | "Alice" | _:b2 |
>>| _:b2 | "Bob"   | _:b2 |
>>-------------------------
>>
>>but as:
>>
>>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>># Add ?mbox to select for clarity
>>SELECT ?x ?name ?y ?mbox
>>WHERE  ( ?x foaf:name  ?name )
>># Reverse these next two lines.
>>         ( ?y foaf:mbox ?mbox )
>>         OPTIONAL ( ?x foaf:mbox ?mbox )
>>
>>
>>I get:
>>
>>------------------------------------------------------------
>>| x    | name    | y    | mbox                             |
>>============================================================
>>| _:b0 | "Bob"   | _:b0 | <mailto:bob@work.example>        |
>>| _:b0 | "Bob"   | _:b1 | <mailto:noname@work.example.org> |
>>| _:b2 | "Alice" | _:b0 | <mailto:bob@work.example>        |
>>| _:b2 | "Alice" | _:b1 | <mailto:noname@work.example.org> |
>>------------------------------------------------------------
>>
>>because the first two triple patterns are unconnected so the number of
>>results
>>is 2 * 2 (each matches twice), and the optional adds nothing because ?x
>>and
>>?mbox were already defined by earlier patterns.
> 
> 
> So your implementation is clearly taking the path of not binding the
> variables as opposed to binding them to a NULL (for optional vars not bound
> by the conditions of the optional). If you bind values to a NULL, you'll get
> the same result regardless of evaluation order. That seems to speak for this
> approach.

The original example was:

WHERE ( ?x foaf:name  ?name )
        OPTIONAL ( ?x foaf:mbox ?mbox )
        ( ?y foaf:mbox ?mbox )

so after

       ( ?x foaf:name  ?name )
        OPTIONAL ( ?x foaf:mbox ?mbox )

if ?mbox = NULL, the question is whether
       ( ?y foaf:mbox ?mbox )
changes ?mbox to a non-null value.

If NULL is handled specially so that it can be rebound, then it is the same
as having an unbound variable - they both represent something that can be
bound later.

If NULL is handled as plain value for matching, then the order matters
because once bound by an optional to NULL (nothing found) then subsequent
matching can't set that variable yet reversing the order causes the variable
to get the non-null value.

This seems to be much the same as multiple outer joins (in SQL) leading to order 
dependences.

> 
>  [...]
> 
> 
>>Having OPTIONAL always pass the "no match" case even when there are other
>>matched results in a stable execution order but it results in extra
>>solutions
>>that add nothing except complications for applications.
>>
>>For example:
>>
>>@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
>>
>>_:a  foaf:name       "Alice" .
>>_:a  foaf:mbox       <mailto:alice@work.example> .
>>
>>Query:
>>
>>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>SELECT ?name ?mbox
>>WHERE
>>      ( ?x foaf:mbox ?mbox )
>>      OPTIONAL ( ?x foaf:name  ?name )
>>
>>
>>gives:
>>
>>-----------------------------------------
>>| name    | mbox                        |
>>=========================================
>>| "Alice" | <mailto:alice@work.example> |
>>-----------------------------------------
>>
>>
>>or:
>>
>>-----------------------------------------
>>| name    | mbox                        |
>>=========================================
>>|         | <mailto:alice@work.example> |
>>| "Alice" | <mailto:alice@work.example> |
>>-----------------------------------------
> 
> 
> You'd only get this if you treated an optional like an or rather than an
> exclusive or, right? i.e.:
> 
>  (?x foaf:mbox ?mbox) and ((?x foaf:name  ?name) or ?name=NULL)
> 
> 		vs.
> 
>  (?x foaf:mbox ?mbox) and 
> 	((?x foaf:name  ?name) or 
> 		(not (?x foaf:name  ?name) and ?name=NULL))
> 
> Or if you didn't bind name to a NULL and evaluated the conjuncts in
> differing order.
> 
> 
> 
>>Given the idea behind optional of "add extra information to a solution if
>>available, but don;t fail if not there"
>>
>>In simple cases, it may be possible to filter the output to remove this
>>redundancy but in more complicated queries (for example, ?name is used
>>elsewhere
>>as well, multiple optionals, sharing variables), I didn't manage to find a
>>way
>>that kept the streaming requirement for results.
>>
>>If the application is displaying information for people, then getting two
>>answers back for what is one person is a less useful paradigm.  It is a
>>tradeoff
>>of convenience.
>>
>>I did find that (same results as the last example):
>>
>>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>SELECT ?name ?mbox
>>WHERE
>>      ( ?x foaf:mbox ?mbox )
>>      { {} UNION ( ?x foaf:name  ?name ) }
>>
>>is illegal in the current grammar (a form of "A and (B or not B)" for
>>optionals). Maybe it shouldn't be.  Thoughts?
> 
> 
> That seems more like:
> 	A and (B or true)
> than:
> 	A and (B or not B) 
> 
> You can't really simulate an optional without some form of not (NAF).

Right - my example was for the form of optional with addition solutions
which has no ordering issues but is unhelpful.  The A and (B or true)
generates those additional unhelpful solutions.

> 
> While on the grammar subject :)... I'd really like to see the grammar
> described in terms of conjuncts (logical factors) and disjuncts (logical
> terms) where constraint expressions are just one more form of logical
> factor. E.g.
> 
> condition :- logical_term [ "union" logical_term]*
> logical_term:- logical_factor [logical_factor]*
> logical_factor:- "(" term operator term ")" |
> 	logical_expression
> logical_expression:- "(" condition ")" |
> 	statement
> statement:- "(" term term term ")"
> term :- function | constant | variable
> 
> ...
> 
> So rather than:
> 
> 	where (?s ?p ?o) and ?o > 2 
> Just say:
> 	where  (?s ?p ?o)(?o > 2)

The constraint syntax includes more general experssions:

   	where
		(?s1 ?p1 ?o1)
		(?s2 ?p2 ?o2)
          AND ?o1 + ?o2  >  5

If everything were written as prefix operators, it should work out (I
haven't checked) but lisp-style is an acquired taste.

As the grammar stands at the moment, there needs to be some delimiter or
start marker on algebraic expressions to control it otherwise ambiguity of
constraint boundaries arises, mostly simply with negative numbers - this is not 
unique SPARQL.  The "AND" token is the current mechanism used.

Just placing a required () round a constraint would work as would [] now
they are freed up.  Other people dislike what they see as excessive bracketting.

AND could be sprinkled everywhere - this is essentially the N3-style triples
syntax (where the AND is a "." and there are no outer "()") that was
discussed but did not get sufficient support.

The best syntax can do is work and spread the dislike evenly; it can't
please. Implementers usually see it differently to users.

> 
> The current "AND" connector between triple patterns and constraints is a
> mess. I'd say it either has to go away or the implicit and between triples
> need to be made explicit.
> 
> 
> -Geoff

	Andy

Received on Thursday, 24 February 2005 11:24:02 UTC