RE: Questions about OPTIONAL from Geoff Chappell on 2005-02-24 (public-rdf-dawg-comments@w3.org from February 2005)

From: Geoff Chappell <geoff@sover.net>
Date: Thu, 24 Feb 2005 08:54:35 -0500
To: <andy.seaborne@hp.com>
Cc: <public-rdf-dawg-comments@w3.org>
Message-ID: <021d01c51a78$61080380$6401a8c0@gsclaptop>
> -----Original Message-----
> From: Seaborne, Andy [mailto:andy.seaborne@hp.com]
> Sent: Thursday, February 24, 2005 6:23 AM
> To: Geoff Chappell
> Cc: public-rdf-dawg-comments@w3.org
> Subject: Re: Questions about OPTIONAL
> 
> 
> 
> Geoff Chappell wrote:
> >
> >>-----Original Message-----
> >>From: Seaborne, Andy [mailto:andy.seaborne@hp.com]
> >>Sent: Wednesday, February 23, 2005 10:00 AM
> >>To: Geoff Chappell
> >>Cc: public-rdf-dawg-comments@w3.org
> >>Subject: Re: Questions about OPTIONAL
> >>
> >>Geoff Chappell wrote:
> >>
> >>>A few questions about OPTIONAL...
> >>>
> >>>1) If I had a query like this:
> >>>
> >>>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> >>>SELECT ?x ?name ?y
> >>>WHERE  ( ?x foaf:name  ?name )
> >>>       OPTIONAL ( ?x foaf:mbox ?mbox )
> >>>       ( ?y foaf:mbox ?mbox )
> >>>
> >>>with data:
> >>>
> >>>@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
> >>>
> >>>_:a  foaf:name       "Alice" .
> >>>_:b  foaf:name       "Bob" .
> >>>_:b  foaf:mbox       <mailto:bob@work.example> .
> >>>_:c  foaf:mbox       <mailto:noname@work.example.org> .
> >>>
> >>>
> >>>should I get:
> >>>
> >>>x   name    y
> >>>=== ======= ===
> >>>_:b "Bob"   _:b
> >>>
> >>>or:
> >>>
> >>>x   name    y
> >>>=== ======= ===
> >>>_:a "Alice" _:b
> >>>_:a "Alice" _:c
> >>>_:b "Bob"   _:b
> >>
> >>Geoof - thank you very much for including a concrete example:
> >>
> >>The current working draft is inadequate in the treatment of order of
> >>execution
> >>implications - this is something that has to be done.
> >>
> >>
> >>Executed purely in the order given, I tried your example and get:
> >>
> >>-------------------------
> >>| x    | name    | y    |
> >>=========================
> >>| _:b0 | "Alice" | _:b1 |
> >>| _:b0 | "Alice" | _:b2 |
> >>| _:b2 | "Bob"   | _:b2 |
> >>-------------------------
> >>
> >>but as:
> >>
> >>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> >># Add ?mbox to select for clarity
> >>SELECT ?x ?name ?y ?mbox
> >>WHERE  ( ?x foaf:name  ?name )
> >># Reverse these next two lines.
> >>         ( ?y foaf:mbox ?mbox )
> >>         OPTIONAL ( ?x foaf:mbox ?mbox )
> >>
> >>
> >>I get:
> >>
> >>------------------------------------------------------------
> >>| x    | name    | y    | mbox                             |
> >>============================================================
> >>| _:b0 | "Bob"   | _:b0 | <mailto:bob@work.example>        |
> >>| _:b0 | "Bob"   | _:b1 | <mailto:noname@work.example.org> |
> >>| _:b2 | "Alice" | _:b0 | <mailto:bob@work.example>        |
> >>| _:b2 | "Alice" | _:b1 | <mailto:noname@work.example.org> |
> >>------------------------------------------------------------
> >>
> >>because the first two triple patterns are unconnected so the number of
> >>results
> >>is 2 * 2 (each matches twice), and the optional adds nothing because ?x
> >>and
> >>?mbox were already defined by earlier patterns.
> >
> >
> > So your implementation is clearly taking the path of not binding the
> > variables as opposed to binding them to a NULL (for optional vars not
> bound
> > by the conditions of the optional). If you bind values to a NULL, you'll
> get
> > the same result regardless of evaluation order. That seems to speak for
> this
> > approach.
> 
> The original example was:
> 
> WHERE ( ?x foaf:name  ?name )
>         OPTIONAL ( ?x foaf:mbox ?mbox )
>         ( ?y foaf:mbox ?mbox )
> 
> so after
> 
>        ( ?x foaf:name  ?name )
>         OPTIONAL ( ?x foaf:mbox ?mbox )
> 
> if ?mbox = NULL, the question is whether
>        ( ?y foaf:mbox ?mbox )
> changes ?mbox to a non-null value.
> 
> If NULL is handled specially so that it can be rebound, then it is the
> same
> as having an unbound variable - they both represent something that can be
> bound later.
> 
> If NULL is handled as plain value for matching, then the order matters
> because once bound by an optional to NULL (nothing found) then subsequent
> matching can't set that variable yet reversing the order causes the
> variable
> to get the non-null value.
> 
> This seems to be much the same as multiple outer joins (in SQL) leading to
> order
> dependences.

I just wanted to dig into this a bit more... maybe I'm making some invalid
or debatable assumptions in my thinking that someone could point out,
because I don't see an order dependency here when NULLs are used (which
isn't to say they wouldn't exist in another case).

Here's the way I see it behaving with (non-overridable) NULL values
(assuming evaluation order is the same as query order). 

Order 1:

 WHERE ( ?x foaf:name  ?name )
         OPTIONAL ( ?x foaf:mbox ?mbox )
         ( ?y foaf:mbox ?mbox )
 

Result of first triple:

x   name
=== =======
_:a "Alice"
_:b "Bob"

Result of first and second triple:

x   name    mbox
=== ======= =========================
_:a "Alice" NULL
_:b "Bob"   <mailto:bob@work.example>


Result of all triples:

x   name    mbox                      y
=== ======= ========================= ===
_:b "Bob"   <mailto:bob@work.example> _:b


Order 2:

 WHERE ( ?x foaf:name  ?name )
       ( ?y foaf:mbox ?mbox )
       OPTIONAL ( ?x foaf:mbox ?mbox ) 

Result of first triple:

x   name
=== =======
_:a "Alice"
_:b "Bob"

Result of first and second triple:

x   name    y   mbox
=== ======= === =========================
_:a "Alice" _:b <mailto:bob@work.example>
_:a "Alice" _:c <mailto:noname@work.example.org>
_:b "Bob"   _:b <mailto:bob@work.example>
_:b "Bob"   _:c <mailto:noname@work.example.org>

Result of all triples:

x   name    y   mbox
=== ======= === =========================
_:b "Bob"   _:b <mailto:bob@work.example>

(this might be the case where folks see it differently. If I spelled this
out in with intermediate join products, would it be clearer? As I see it,
since all vars are bound coming into the optional, mbox can't be overridden
with a NULL value. Perhaps others see it differently?)


Order 3:

 WHERE OPTIONAL ( ?x foaf:mbox ?mbox ) 
	( ?x foaf:name  ?name )
      ( ?y foaf:mbox ?mbox )

(treatment of an initial optional is similar to the way you'd have to treat
an initial negation - not surprising if you consider OPTIONAL A to be A or
NOT A. So I'm assuming it turns into something like this under the hood:

 ( ?x foaf:mbox ?mbox ) or ((?x ?any ?mbox) and not ( ?x foaf:mbox ?mbox ))

i.e. introduce the full graph so the negation has something to go against.
Happily any reasonable query optimizer would pick one of the other orders
rather than doing this :)

Result of first triple:

x   mbox
=== ================================
_:b <mailto:bob@work.example>
_:c <mailto:noname@work.example.org>
_:a "Alice"
_:b "Bob"

Result of first and second triple:

x   mbox                             name
=== ================================ =====
_:b <mailto:bob@work.example>        "Bob"
_:a "Alice"                          "Alice"                           
_:b "Bob"                            "Bob"

Result of all triples:

x   mbox                             name    y
=== ================================ ======= ===
_:b <mailto:bob@work.example>        "Bob"   _:b


I'll skip the other possible orders because they're not interesting (i.e.
it's really only moving the optional around that changes things).

So that's what I'm thinking (and by the way what I've implemented). What are
the points of contention with this approach?


> >
> >  [...]
> >
> >
> >>Having OPTIONAL always pass the "no match" case even when there are
> other
> >>matched results in a stable execution order but it results in extra
> >>solutions
> >>that add nothing except complications for applications.
> >>
> >>For example:
> >>
> >>@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
> >>
> >>_:a  foaf:name       "Alice" .
> >>_:a  foaf:mbox       <mailto:alice@work.example> .
> >>
> >>Query:
> >>
> >>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> >>SELECT ?name ?mbox
> >>WHERE
> >>      ( ?x foaf:mbox ?mbox )
> >>      OPTIONAL ( ?x foaf:name  ?name )
> >>
> >>
> >>gives:
> >>
> >>-----------------------------------------
> >>| name    | mbox                        |
> >>=========================================
> >>| "Alice" | <mailto:alice@work.example> |
> >>-----------------------------------------
> >>
> >>
> >>or:
> >>
> >>-----------------------------------------
> >>| name    | mbox                        |
> >>=========================================
> >>|         | <mailto:alice@work.example> |
> >>| "Alice" | <mailto:alice@work.example> |
> >>-----------------------------------------
> >
> >
> > You'd only get this if you treated an optional like an or rather than an
> > exclusive or, right? i.e.:
> >
> >  (?x foaf:mbox ?mbox) and ((?x foaf:name  ?name) or ?name=NULL)
> >
> > 		vs.
> >
> >  (?x foaf:mbox ?mbox) and
> > 	((?x foaf:name  ?name) or
> > 		(not (?x foaf:name  ?name) and ?name=NULL))
> >
> > Or if you didn't bind name to a NULL and evaluated the conjuncts in
> > differing order.
> >
> >
> >
> >>Given the idea behind optional of "add extra information to a solution
> if
> >>available, but don;t fail if not there"
> >>
> >>In simple cases, it may be possible to filter the output to remove this
> >>redundancy but in more complicated queries (for example, ?name is used
> >>elsewhere
> >>as well, multiple optionals, sharing variables), I didn't manage to find
> a
> >>way
> >>that kept the streaming requirement for results.
> >>
> >>If the application is displaying information for people, then getting
> two
> >>answers back for what is one person is a less useful paradigm.  It is a
> >>tradeoff
> >>of convenience.
> >>
> >>I did find that (same results as the last example):
> >>
> >>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> >>SELECT ?name ?mbox
> >>WHERE
> >>      ( ?x foaf:mbox ?mbox )
> >>      { {} UNION ( ?x foaf:name  ?name ) }
> >>
> >>is illegal in the current grammar (a form of "A and (B or not B)" for
> >>optionals). Maybe it shouldn't be.  Thoughts?
> >
> >
> > That seems more like:
> > 	A and (B or true)
> > than:
> > 	A and (B or not B)
> >
> > You can't really simulate an optional without some form of not (NAF).
> 
> Right - my example was for the form of optional with addition solutions
> which has no ordering issues but is unhelpful.  The A and (B or true)
> generates those additional unhelpful solutions.
> 
> >
> > While on the grammar subject :)... I'd really like to see the grammar
> > described in terms of conjuncts (logical factors) and disjuncts (logical
> > terms) where constraint expressions are just one more form of logical
> > factor. E.g.
> >
> > condition :- logical_term [ "union" logical_term]*
> > logical_term:- logical_factor [logical_factor]*
> > logical_factor:- "(" term operator term ")" |
> > 	logical_expression
> > logical_expression:- "(" condition ")" |
> > 	statement
> > statement:- "(" term term term ")"
> > term :- function | constant | variable
> >
> > ...
> >
> > So rather than:
> >
> > 	where (?s ?p ?o) and ?o > 2
> > Just say:
> > 	where  (?s ?p ?o)(?o > 2)
> 
> The constraint syntax includes more general experssions:
> 
>    	where
> 		(?s1 ?p1 ?o1)
> 		(?s2 ?p2 ?o2)
>           AND ?o1 + ?o2  >  5
> 
> If everything were written as prefix operators, it should work out (I
> haven't checked) but lisp-style is an acquired taste.
> 
> As the grammar stands at the moment, there needs to be some delimiter or
> start marker on algebraic expressions to control it otherwise ambiguity of
> constraint boundaries arises, mostly simply with negative numbers - this
> is not
> unique SPARQL.  The "AND" token is the current mechanism used.
> 
> Just placing a required () round a constraint would work as would [] now
> they are freed up.  Other people dislike what they see as excessive
> bracketting.
> 
> AND could be sprinkled everywhere - this is essentially the N3-style
> triples
> syntax (where the AND is a "." and there are no outer "()") that was
> discussed but did not get sufficient support.
> 
> The best syntax can do is work and spread the dislike evenly; it can't
> please. Implementers usually see it differently to users.

Fair enough. But this feels more like an implementor-driven solution than a
user-driven solution to me. Does the grammar really require an AND in this
case to deal with negative numbers or is that only the case if you want it
to be parseable with a certain class of parsers? If it's the latter, that's
what I call an implementor-driven solution (and is it really a good idea to
make a decision that favors a handful of implementers vs. a (hopefully) huge
number of users?). I think users are best able to understand a clear and
simple model. As it is now, you'll have to explain to them why a constraint
expression is different from a triple pattern ("But aren't they just
different sorts of constraints on my solution?") and why the logical "AND"
is required in some places and implicit in others.

> >
> > The current "AND" connector between triple patterns and constraints is a
> > mess. I'd say it either has to go away or the implicit and between
> triples
> > need to be made explicit.
> >
> >
> > -Geoff
> 
> 	Andy

Geoff
Received on Thursday, 24 February 2005 13:54:48 UTC