- From: Jeen Broekstra <jeen@aduna.biz>
- Date: Wed, 07 Sep 2005 17:13:41 +0200
- To: DAWG Mailing List <public-rdf-dawg@w3.org>
This is an analysis of the behavior of boolean operators (disjunction in
particular) with respect to type errors in SPARQL, related to TimBL's
comments
(http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0030.html)
and EricP's suggestion
(http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0360.html),
and to several online comments by various DAWG member on IRC. I've tried
to be make this an overview of the current LC approach vs. other
approaches (particularly SQL), so forgive me for being a bit lengthy.
In the current LC working draft, type errors in disjunctions are always
masked. There are a number of difficulties with this.
1. It breaks De Morgan's laws:
A | B | A && B | A || B | !A | ! (A && B) | (!A) || (!B)
---+---+--------+--------+----+------------+--------------
T | E | E | T | F | E | F
F | E | E | F | T | E | T
E | T | E | T | E | E | F
E | F | E | F | E | E | T
E | E | E | E | E | E | E
Clarification (sorry if I'm being tiresome): de Morgan's Laws of duality
state that through use of negation, disjunction and conjunction can be
expressed as each other:
!(A && B) = (!A) || (!B)
!(A || B) = (!A) && (!B)
I've illustrated in the above table that the first duality does not
hold. The other is left as an excercise for the reader.
2. It breaks a 'classical' operator short-circuit:
F && x -> E or F (classically, F)
3. It breaks the intuitive understanding of disjunction (cf. TBL's
comment):
F || x -> T or F (classically, x)
===
In SQL, the typing errors are evaluated differently (cf.
http://www.oreilly.com/catalog/orsqlter/chapter/ch01.html#42643):
A | B | A && B | A || B | !A | ! (A && B) | (!A) || (!B)
---+---+--------+--------+----+------------+-----------------
T | E | E | T | F | E | E*
F | E | F* | E* | T | T* | T
E | T | E | T | E | E | E*
E | F | F* | E* | E | T* | T
E | E | E | E | E | E | E
I've marked the differences with the current LC with an asterisk.
As you can see, this system is more consistent: it conforms to De
Morgan's Laws and does not break any classical short-circuits, nor does
it violate the intuitive understanding of disjunction.
So how does this affect querying?
Let's look again at Tim's example scenario. There was no smoke, and the
temparature is uncomparable to 40. The question he wants to ask is if
there is any smoke or if the temperature is above 40.
In RDF:
data:
:detector :temperature "garbled"^^my:UnknownDT .
:detector :smoke 0.
query:
SELECT ?smoke ?temp
WHERE { :foo :smoke ?smoke ;
:temperature ?temp
FILTER (?smoke = 1 || ?temp > 40).
}
result using LC truth table: ( F || E -> F ):
no results.
result using SQL truth table: ( F || E -> E):
no results.
At first, no difference. (arguably in the SQL scenario one should give
back an error of some kind, I'll get back to that later).
However, the difference becomes apparent if we slightly alter the query
by using a NOT:
SELECT ?smoke ?temp
WHERE { :foo :smoke ?smoke ;
:temperature ?temp
FILTER (not(?smoke = 1 || ?temp <= 40)).
}
result using LC truth table: (not(F || E) -> not(F) -> T )
?smoke ?temp
0 "garbled"^^my:UnknownDT
result using SQL truth table: (not(F || E) -> not(E) -> E )
no results.
Now, clearly what we see here is that because de Morgan is violated, we
get very counter-intuitive results: a query that seems to be the logical
equivalent of the orginal gives different results. I think this is
highly undesirable.
In the above I have tacitly assumed that the eventual result of a FILTER
condition, if it is a typing error, is equivalent to FALSE: we simply
give no result. Of course, it would be possible to design a query engine
such that if this occurred, it raises an error, either in the result or
in some other fashion. However, I would expect a SPARQL engine to
continue evaluating; it seems to me to be unnecessarily restrictive to
require that on such errors, evaluation halts completely. In this
respect RDF is significantly different from SQL in that I expect it to
be much more common that a set of values is heterogeneous, and that
datatyped comparisons will relatively often produce type errors.
Skipping such type errors and continuing with possible valid comparisons
seems a much more graceful solution to me.
Jeen
Received on Wednesday, 7 September 2005 15:14:25 UTC