Re: Disjunction vs. Optional ... and UNION

Bob MacGregor wrote:
> The committee has shelved disjunction and retained optional.
> The reasons for this include the following statement:
> 
> 
>>Disjuntion provides a lot of the same capability as optional matches, but
>>as a developer I've only seen feature reqests expressed in terms of
>>optional match, no disjunction. Many disjunctive queries can be expressed
>>in terms of optionals and value disjunctions, but I have not attempted to
>>show wether all can be or not.
> 
> 
> This statement reveals a PROFOUND ignorance about the meaning and uses of
> disjunction and optional.
> 
> Backing up a bit first, the right way to design a logic or a language (e.g., SQARQL)
> is to invent a small set of core operators, get their meaning right, and then
> build up the rest of the language in terms of them.  That way, the semantic
> spec for the language is relatively small, even as the syntax grows.  The
> base operators for SPARQL ought to be AND and OR (to begin with).
> 
> The SQL language provides a few million or billion use cases for disjunction, but
> for those who still don't get it, here are a couple of simple examples.
> 
> SELECT ?c
> WHERE (?c rdf:type my:Car) AND
>       { (?c color "red") OR (?c color "blue") }
> 
> 
> SELECT ?p
> WHERE (?p rdf:type my:Person) AND
>       (?p ?age ?a) AND
>       { (FILTER ?a < 21) OR (FILTER ?a > 65) }
> 
> 
> The correct semantics for OPTIONAL can be illustrated by defining it in terms of
> disjunction.  For example:
> 
>          WHERE (?a my:foo ?b) (OPTIONAL (?a my:bar ?c))
> 
> is equivalent to 
> 
>          WHERE (?a my:foo ?b) {(?a my:bar ?c) OR TRUE}
> 
> where 'TRUE' is a statement that always evaluates to true.  SPARQL may or may not wish
> to include the constant 'TRUE' in its syntax, but 'TRUE' is quite handy for defining
> semantics.

RDF is semi-structured. The use task that motivates OPTIONAL is accessing an RDF 
graph where some of the data is missing.  Experience has shown that this is a 
real issue for application writers.

Example:

"Get the FOAF name and nick" from a database of people which uses FOAF. 
foaf:name and foaf:nick do not necessarily exist for each person.  Assuming the 
well-formedness of the FOAF RDF there is an mbox (which may be unknown - we'll 
assume it is in the graph for now).

--------
@prefix foaf:       <http://xmlns.com/foaf/0.1/> .

_:a foaf:mbox   <mailto:alice@example.net> .
_:a foaf:name   "Alice" .
_:a foaf:nick   "WhoMe?" .

_:b foaf:mbox   <mailto:bert@example.net> .
_:b foaf:name   "Bert" .

_:e foaf:mbox   <mailto:eve@example.net> .
_:e foaf:nick   "DuckSoup" .
--------
PREFIX  foaf:   <http://xmlns.com/foaf/0.1/>

SELECT ?mbox ?name ?nick
   {
     ?x foaf:mbox ?mbox .
     OPTIONAL { ?x foaf:name  ?name } .
     OPTIONAL { ?x foaf:nick  ?nick } .
   }
--------

giving something like:

-----------------------------------------------------
| mbox                       | name    | nick       |
=====================================================
| <mailto:alice@example.net> | "Alice" | "WhoMe?"   |
| <mailto:bert@example.net>  | "Bert"  |            |
| <mailto:eve@example.net>   |         | "DuckSoup" |
-----------------------------------------------------

Defining OPTIONAL as "{(?a my:bar ?c) OR TRUE}"
--------
PREFIX  foaf:   <http://xmlns.com/foaf/0.1/>

SELECT ?mbox ?name ?nick
   {
     ?x foaf:mbox ?mbox .
     { ?x foaf:name  ?name } OR {} .
     { ?x foaf:nick  ?nick } OR {} .
   }
--------

[[
{} is "TRUE "- it is a graph pattern which imposes no limitation on the query 
solutions.  SPARQL patterns are limitations on the space of all possible solutions.
]]

Would I be right in saying that this gives something like:

-----------------------------------------------------
| mbox                       | name    | nick       |
=====================================================
| <mailto:alice@example.net> | "Alice" | "WhoMe?"   |
| <mailto:alice@example.net> | "Alice" |            |
| <mailto:alice@example.net> |         | "WhoMe?"   |
| <mailto:alice@example.net> |         |            |
| <mailto:bert@example.net>  | "Bert"  |            |
| <mailto:bert@example.net>  |         |            |
| <mailto:eve@example.net>   |         | "DuckSoup" |
| <mailto:eve@example.net>   |         |            |
-----------------------------------------------------

that is, for example, 4 rows for alice because there are 4 ways to match the 
pattern against the graph.

If "OR" only evaluates the RHS is the LHS is false (programming language style) 
we are back in order dependent territory.  Even if the two subpatterns are 
variable compatible:  {:x :p ?v } OR { :x :q ?v } it seems that pattern matching 
should return all possibilities.

Defining "OPTIONAL P" as "P OR true" may evaluate as a boolean expression to be 
the same but the solutions are different.  It is the a trade-off of application 
task and computational model, including the support cost of explaining to people 
how to achieve their application task.

It seems rather hard in to turn the multiple rows form back into the first form 
by adding rules about variable usage and doing a sort.  That would seem to be 
little different to where we are with OPTIONAL.

What am I missing?

Could you give me an example of where your proposal differs?


	Andy

> 
> So, OPTIONAL is conveniently defined as a disjunction of its argument and the constant TRUE.  
> Once you understand this, it is easy to observe that an OPTIONAL clause has no
> ability to restrict the bindings in a WHERE clause.
> 
> The two use case examples for disjunction that I included above both restrict the
> rest of the WHERE clause.  Therefore, they can't possibly be rewritten in terms
> of OPTIONAL.
> 
> 
> Popping up.  The SPARQL query appears to be deathly afraid of disjunction, while
> including OPTIONAL.  The semantics of OPTIONAL, while different, and weaker, than
> disjunction, are rather similar with regards to getting a correct, declarative
> semantics.  In other words committeed should be as leery of OPTIONAL as they are 
> of disjunction  -- I seriously doubt that the committee could get the semantics for OPTIONAL
> correct without knowing how to work out a semantics for disjunction.
> 
> Therefore, while I have been a vocal supporter of OPTIONAL up to now, I would now
> like to reverse my position, and recommend witholding OPTIONAL from the spec until
> such time as the committee can successfully revisit the question of disjunction.
> 
> 
> That still leaves the unfortunate UNION operator.  SQL has UNION, along with INTERSECTION
> and DIFFERENCE (if I recall correctly), and isolates them by placing them outside
> of the SELECT-FROM-WHERE block.  Doing so makes a clear syntactic distinction between
> UNION and OR.  The SPARQL syntax makes UNION appear to be synonym for the disjunctive
> OR operator.  The spec states that UNION works on graphs rather than statements, but its
> not at all clear syntactically how the statements in a where clause magically turn into
> graphs.  Users will be hopelessly confused by UNION syntax as it now exists.
> 
> Therefore, I recommend either eliminating UNION for the present, or redefining it as SQL
> has done, placing it outside of the SELECT WHERE block.
> 
> Regards, Bob
> 

Received on Wednesday, 23 March 2005 12:07:44 UTC