Re: Disjunction vs. Optional ... and UNION from Bob MacGregor on 2005-03-24 (public-rdf-dawg-comments@w3.org from March 2005)

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Thu, 24 Mar 2005 00:08:57 -0800
To: andy.seaborne@hp.com
CC: public-rdf-dawg-comments@w3.org, Pat Hayes <phayes@ihmc.us>
Message-ID: <42427599.3000900@siderean.com>
Andy,

Thanks for working out a detailed example.  It nicely illustrates
how the construct I'm proposing for optional works, yielding exactly
the solution that you deem correct.  Possibly
adding in a one-to-many relationship would make it even
clearer to some folks, but this should suffice ...

Seaborne, Andy wrote:

>
> RDF is semi-structured. The use task that motivates OPTIONAL is 
> accessing an RDF graph where some of the data is missing.  Experience 
> has shown that this is a real issue for application writers.

I agree completely.  That why our implementation of an RDF query 
language has OPTIONAL (as well as conventional disjunction).

>
> Example:
>
> "Get the FOAF name and nick" from a database of people which uses 
> FOAF. foaf:name and foaf:nick do not necessarily exist for each 
> person.  Assuming the well-formedness of the FOAF RDF there is an mbox 
> (which may be unknown - we'll assume it is in the graph for now).
>
> --------
> @prefix foaf:       <http://xmlns.com/foaf/0.1/> .
>
> _:a foaf:mbox   <mailto:alice@example.net> .
> _:a foaf:name   "Alice" .
> _:a foaf:nick   "WhoMe?" .
>
> _:b foaf:mbox   <mailto:bert@example.net> .
> _:b foaf:name   "Bert" .
>
> _:e foaf:mbox   <mailto:eve@example.net> .
> _:e foaf:nick   "DuckSoup" .
> --------
> PREFIX  foaf:   <http://xmlns.com/foaf/0.1/>
>
> SELECT ?mbox ?name ?nick
>   {
>     ?x foaf:mbox ?mbox .
>     OPTIONAL { ?x foaf:name  ?name } .
>     OPTIONAL { ?x foaf:nick  ?nick } .
>   }
> --------
>
> giving something like:
>
> -----------------------------------------------------
> | mbox                       | name    | nick       |
> =====================================================
> | <mailto:alice@example.net> | "Alice" | "WhoMe?"   |
> | <mailto:bert@example.net>  | "Bert"  |            |
> | <mailto:eve@example.net>   |         | "DuckSoup" |
> -----------------------------------------------------
>
> Defining OPTIONAL as "{(?a my:bar ?c) OR TRUE}"
> --------
> PREFIX  foaf:   <http://xmlns.com/foaf/0.1/>
>
> SELECT ?mbox ?name ?nick
>   {
>     ?x foaf:mbox ?mbox .
>     { ?x foaf:name  ?name } OR {} .
>     { ?x foaf:nick  ?nick } OR {} .
>   }
> --------
>
> [[
> {} is "TRUE "- it is a graph pattern which imposes no limitation on 
> the query solutions.  SPARQL patterns are limitations on the space of 
> all possible solutions.
> ]]
>
> Would I be right in saying that this gives something like:
>
> -----------------------------------------------------
> | mbox                       | name    | nick       |
> =====================================================
> | <mailto:alice@example.net> | "Alice" | "WhoMe?"   |
> | <mailto:alice@example.net> | "Alice" |            |
> | <mailto:alice@example.net> |         | "WhoMe?"   |
> | <mailto:alice@example.net> |         |            |
> | <mailto:bert@example.net>  | "Bert"  |            |
> | <mailto:bert@example.net>  |         |            |
> | <mailto:eve@example.net>   |         | "DuckSoup" |
> | <mailto:eve@example.net>   |         |            |
> -----------------------------------------------------
>
> that is, for example, 4 rows for alice because there are 4 ways to 
> match the pattern against the graph.
>
> If "OR" only evaluates the RHS is the LHS is false (programming 
> language style) we are back in order dependent territory.  Even if the 
> two subpatterns are variable compatible:  {:x :p ?v } OR { :x :q ?v } 
> it seems that pattern matching should return all possibilities.
>
> Defining "OPTIONAL P" as "P OR true" may evaluate as a boolean 
> expression to be the same but the solutions are different.  It is the 
> a trade-off of application task and computational model, including the 
> support cost of explaining to people how to achieve their application 
> task.
>
> It seems rather hard in to turn the multiple rows form back into the 
> first form by adding rules about variable usage and doing a sort.  
> That would seem to be little different to where we are with OPTIONAL.
>
> What am I missing?

I see how you got the result you did, but your evaluation was flawed.  
Assume that
we are defining OPTIONAL as "{(?a my:bar ?c) OR TRUE}"  (i.e., using my 
expansion):
We expand out the evaluation of the above query into clausal form (using 
some
abbreviations), yielding:

{?mbox =<alice>  and (?name="alice" or true) and (?nick="who" or true)} or
{?mbox =<bert>  and (?name="bert" or true) and (false or true)} or
{?mbox =<eve>  and (false or true) and (?nick="duck" or true)}

This simplifies to

{?mbox =<alice>  and (?name="alice") and (?nick="who"} or
{?mbox =<bert>  and (?name="bert") } or
{?mbox =<eve>  and (?nick="duck")}

Which yields exactly the set of bindings that you indicated that you 
wish to see.

If your or someone else's reasoner processes the (... OR true) construct 
differently on this example, then
there is probably something wrong with the way it is implementing 
disjunction.

Next, let me point out how you may have gotten the wrong 
interpretation.  The answer that you
suggested my construct would get looks as follows, expanded into clausal 
form:

{?mbox =<alice>  and (?name="alice") and (?nick="who"} or
{?mbox =<alice>  and (?name="alice") } or
{?mbox =<alice>  and (?nick="who"} or
{?mbox =<bert> } or
{?mbox =<bert>  and (?name="bert") } or
{?mbox =<bert> } or
{?mbox =<eve>  and (?nick="duck")} or
{?mbox =<bert> }

In fact, this last answer is logically identical to the one just above 
it.  However, some
of the disjuncts are subsumed by others, so a simplifier would eliminate 
them.  In
other words, from a logic standpoint, these two different solutions are 
in fact identical.

What does that tell us?  One thing that it tells us is that the 3 row 
table and the 8 row
table that you listed above contain exactly the same information, when 
using a standard
logical interpretation. 

However, suppose we decide to treat "UNBOUND" as a value, rather as the 
absence
of a binding.  Then our mapping into logic breaks down.  This appears to 
be a mistake
that several of the committee members and/or outside participants are 
making.
Repeatedly, I am observing semantics being argued by reference to how a 
processor
might generate an answer.  That's not how a logical semantics get 
defined.  Semantics
should be argued by an appeal to or mapping to standard logic, as I've 
just demonstrated.

After you have a logical interpretation worked out, then you should be 
able to safely introduce
a value for UNBOUND, since its obvious that something (not a bnode) 
needs to be
returned to occupy the corresponding position in a SELECT clause.

I have yet to see any of the discussants make reference to "recursive 
fixed-point"
semantics.  Instead, discussions of "order of evaluation" keep popping 
up (including
in the SPARQL document).   Recursive fixed-point semantics is, I 
believe, only
one approach to defining a declarative language, but its one that works, 
whereas
appeals to ordering don't have any place in a declarative language 
framework.
You do have a prominent committee member who can provide you with
better guidance than I can on how to apply fixed-point semantics to SPARQL.

Cheers, Bob
Received on Thursday, 24 March 2005 08:09:54 UTC