boolean operators and type errors from Jeen Broekstra on 2005-09-07 (public-rdf-dawg@w3.org from July to September 2005)

From: Jeen Broekstra <jeen@aduna.biz>
Date: Wed, 07 Sep 2005 17:13:41 +0200
To: DAWG Mailing List <public-rdf-dawg@w3.org>
Message-ID: <431F03A5.9040101@aduna.biz>
This is an analysis of the behavior of boolean operators (disjunction in
particular) with respect to type errors in SPARQL, related to TimBL's
comments
(http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0030.html)
and EricP's suggestion
(http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0360.html),
and to several online comments by various DAWG member on IRC. I've tried
to be make this an overview of the current LC approach vs. other
approaches (particularly SQL), so forgive me for being a bit lengthy.

In the current LC working draft, type errors in disjunctions are always
masked. There are a number of difficulties with this.

1. It breaks De Morgan's laws:

  A | B | A && B | A || B | !A | ! (A && B) | (!A) || (!B)
---+---+--------+--------+----+------------+--------------
  T | E |   E    |   T    |  F |     E      |  F
  F | E |   E    |   F    |  T |     E      |  T
  E | T |   E    |   T    |  E |     E      |  F
  E | F |   E    |   F    |  E |     E      |  T
  E | E |   E    |   E    |  E |     E      |  E

Clarification (sorry if I'm being tiresome): de Morgan's Laws of duality
state that through use of negation, disjunction and conjunction can be
expressed as each other:

  !(A && B) = (!A) || (!B)
  !(A || B) = (!A) && (!B)

I've illustrated in the above table that the first duality does not
hold. The other is left as an excercise for the reader.

2. It breaks a 'classical' operator short-circuit:

    F && x -> E or F (classically, F)

3. It breaks the intuitive understanding of disjunction (cf. TBL's
    comment):

    F || x -> T or F (classically, x)

===

In SQL, the typing errors are evaluated differently (cf.
http://www.oreilly.com/catalog/orsqlter/chapter/ch01.html#42643):

  A | B | A && B | A || B | !A | ! (A && B) | (!A) || (!B)
---+---+--------+--------+----+------------+-----------------
  T | E |   E    |   T    |  F |     E      |  E*
  F | E |   F*   |   E*   |  T |     T*     |  T
  E | T |   E    |   T    |  E |     E      |  E*
  E | F |   F*   |   E*   |  E |     T*     |  T
  E | E |   E    |   E    |  E |     E      |  E

I've marked the differences with the current LC with an asterisk.

As you can see, this system is more consistent: it conforms to De
Morgan's Laws and does not break any classical short-circuits, nor does
it violate the intuitive understanding of disjunction.

So how does this affect querying?

Let's look again at Tim's example scenario. There was no smoke, and the 
temparature is uncomparable to 40. The question he wants to ask is if 
there is any smoke or if the temperature is above 40.

In RDF:

data:
   :detector :temperature "garbled"^^my:UnknownDT .
   :detector :smoke 0.

query:
    SELECT ?smoke ?temp
    WHERE { :foo :smoke ?smoke ;
                 :temperature ?temp
	   FILTER (?smoke = 1 || ?temp > 40).
	  }

result using LC truth table: ( F || E -> F ):
   no results.

result using SQL truth table: ( F || E -> E):
   no results.

At first, no difference. (arguably in the SQL scenario one should give 
back an error of some kind, I'll get back to that later).

However, the difference becomes apparent if we slightly alter the query 
by using a NOT:

    SELECT ?smoke ?temp
    WHERE { :foo :smoke ?smoke ;
                 :temperature ?temp
	   FILTER (not(?smoke = 1 || ?temp <= 40)).
	  }

result using LC truth table:  (not(F || E) -> not(F) -> T )
  ?smoke ?temp
  0      "garbled"^^my:UnknownDT

result using SQL truth table: (not(F || E) -> not(E) -> E )
   no results.

Now, clearly what we see here is that because de Morgan is violated, we 
get very counter-intuitive results: a query that seems to be the logical 
equivalent of the orginal gives different results. I think this is 
highly undesirable.

In the above I have tacitly assumed that the eventual result of a FILTER 
condition, if it is a typing error, is equivalent to FALSE: we simply 
give no result. Of course, it would be possible to design a query engine 
such that if this occurred, it raises an error, either in the result or 
in some other fashion. However, I would expect a SPARQL engine to 
continue evaluating; it seems to me to be unnecessarily restrictive to 
require that on such errors, evaluation halts completely. In this 
respect RDF is significantly different from SQL in that I expect it to 
be much more common that a set of values is heterogeneous, and that 
datatyped comparisons will relatively often produce type errors. 
Skipping such type errors and continuing with possible valid comparisons 
seems a much more graceful solution to me.

Jeen
Received on Wednesday, 7 September 2005 15:14:25 UTC