W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > March 2005

Re: Disjunction vs. Optional ... and UNION

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Fri, 25 Mar 2005 09:29:39 -0800
Message-ID: <42444A83.6050807@siderean.com>
To: Geoff Chappell <gchappell@intellidimension.com>, andy.seaborne@hp.com
CC: public-rdf-dawg-comments@w3.org
Geoff and Andy,

I have to recant somewhat. Now that its daylight, I can see that in
a set of the bindings like

{?mbox =<alice>  and (?name="alice" or true) and (?nick="who" or true)} or
{?mbox =<bert>  and (?name="bert" or true) and (false or true)} or
{?mbox =<eve>  and (false or true) and (?nick="duck" or true)}

the 'and' and 'or' can't be treated like
ordinary logical AND and OR to get the result that we want.

I've started a side-discussion to see how to get a logical
formalism that delivers the ideal set of bindings.  Pat Hayes
was kind enough to give me this pointer
    http://www.w3.org/2001/sw/DataAccess/ftf5-bos.html#item02
which I wasn't able to dig up myself, and that sheds some
additional light on the committee's stance.  But more light
is needed ...

Here is the base I'm starting from.  I assume a single
logical AND operator (unlike the two ("." and "&&") in
SPARQL), so that triple patterns and constraints work
in all combinations.  I know that I can do this safely, because
KIF does it.  I use a conventional disjunction operator,
OR (unlike the strange SPARQL UNION operator).  Again, its
a subset of KIF, so there are no semantic problems.  And
then we want to tackle OPTIONAL.

There are no issues with ordering at this point.  The
SPARQL document mutters about ordering here and there,
and its unclear to me (1) why SPARQL has two AND operators;
(2) why it has a non-standard disjunction, (3) whether this
bizarreness relates to OPTIONAL somehow, or if it has its
own origins, and (4) whether ordering issues arise with
the ANDs and UNION.  The SPARQL document pretty much takes
these things for granted, and drilling down by following
URLs doesn't help much.  I'm sure there are discussions
on each somewhere on the Web, but they aren't easy to locate
(for me anyway).

So, my staring point is much cleaner, but also different,
so I can't tell if that makes our approach to OPTIONAL
less relevant.

Our query processor is a backchainer designed to handle
a KIF-like language.  The implementation of OPTIONAL
shares the same machinery as the implementation of the OR
operator.  Our implementation of OPTIONAL takes nine (9)
lines of Java code.  It might have been six lines, but
one can implement either a "smart" or a "dumb" version
of OPTIONAL.

The dumb version of OPTIONAL evaluates its argument,
calls the continuation method to evaluate remaining
clauses, and when the call returns, calls the
continuation code again without evaluating the argument
to OPTIONAL, thereby simulating what I refer to
as the 'true' argument in the notionally disjunctive version
of optional.  This version generates the excess
rows containing too many occurrences of variables
bound to "unbound".

The smart version of OPTIONAL evaluates its argument
and calls the continuation code as before, but then
it checks to see if any output rows were generated.  If so,
it knows that evaluation of 'true' is unnecessary.
Otherwise, it does the extra evaluation.  This version
produces the more succinct output that has generally been deemed
"correct" or "desirable" in the various e-mail discussions.

I have to admit that, because implementing OPTIONAL proved to be
almost trivial, I may have incorrectly assumed that OPTIONAL
can be safely embedded into a KIF-like logic.

OPTIONAL appears to combine several distinct issues.  Ideally,
we could split them out, and address them individually.
Here is an attempt at enumerating them:

(1) Is either the "dumb" version or the "smart" one semantically-valid,
or is the whole idea untenable?

(2) If the answer to (1) is "yes", can we produce a semantics
that conforms to the smart version?

(3) Do there need to be any restrictions on scoping of
variables to make things work (i.e., as a by-product of
finding a solution to issue (1))?

(4) Does the semantics for OPTIONAL (assuming there is one)
require that we redefine AND and OR (as SPARQL does now),
or are those issues decouplable?

(5) Do we need a special value representing 'unbound'?
If so, does one value suffice, or do we need many
(possibly, the bnode-as-unbound question derives from this one)?

Observation: The questions I was recently fielding mainly concerned
issue (2).  If we don't have a definitive answer to (1),
then (2) is irrelevant, so for the moment, we should really
not be arguing that question.

A good starting point would be to investigate whether a
simple version of OPTIONAL, say one with no nesting,
no constraints, and only conjunction, has a semantic basis.
 
Cheers, Bob



Geoff Chappell wrote:

>Hi, 
>
>I was just reading your post on the dawg comments list and was wondering if
>you could clarify something for me.
>
>You say:
>
>{?mbox =<alice>  and (?name="alice" or true) and (?nick="who" or true)} or
>{?mbox =<bert>  and (?name="bert" or true) and (false or true)} or
>{?mbox =<eve>  and (false or true) and (?nick="duck" or true)}
>
>simplifies to:
>
>{?mbox =<alice>  and (?name="alice") and (?nick="who"} or
>{?mbox =<bert>  and (?name="bert") } or
>{?mbox =<eve>  and (?nick="duck")}
>
>
>But what is your process of simplification? Logically, x or true = true (not
>x which seems to be what you're saying). Seems like the top should simplify
>to:
>
>{?mbox =<alice>} or
>{?mbox =<bert>} or
>{?mbox =<eve>}
>
>which is true when combined (ANDed) with the original query, but doesn't
>carry all of the information.
>
>Seems to me the only way the simplification works the way you imply is if
>you treat the or procedurally (i.e. x + y = x if x; x + y = y if !x) which
>is exactly what you're trying to avoid.
>
>Thanks for any clarification you can provide.
>
>Regards,
>
>Geoff
>
>
>
>  
>

-- 

Bob MacGregor
Chief Scientist

	
	Siderean Software Inc
390 North Sepulveda Blvd., Suite 2070
<http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> 
El Segundo, CA 90245
bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> 	
tel: 	+1-310 647-4266
fax: 	+1-310-647-3470

 

 

 

 
Received on Friday, 25 March 2005 17:30:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:48 GMT