- From: Jeen Broekstra <jeen@aduna.biz>
- Date: Wed, 07 Sep 2005 17:13:41 +0200
- To: DAWG Mailing List <public-rdf-dawg@w3.org>
This is an analysis of the behavior of boolean operators (disjunction in particular) with respect to type errors in SPARQL, related to TimBL's comments (http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0030.html) and EricP's suggestion (http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0360.html), and to several online comments by various DAWG member on IRC. I've tried to be make this an overview of the current LC approach vs. other approaches (particularly SQL), so forgive me for being a bit lengthy. In the current LC working draft, type errors in disjunctions are always masked. There are a number of difficulties with this. 1. It breaks De Morgan's laws: A | B | A && B | A || B | !A | ! (A && B) | (!A) || (!B) ---+---+--------+--------+----+------------+-------------- T | E | E | T | F | E | F F | E | E | F | T | E | T E | T | E | T | E | E | F E | F | E | F | E | E | T E | E | E | E | E | E | E Clarification (sorry if I'm being tiresome): de Morgan's Laws of duality state that through use of negation, disjunction and conjunction can be expressed as each other: !(A && B) = (!A) || (!B) !(A || B) = (!A) && (!B) I've illustrated in the above table that the first duality does not hold. The other is left as an excercise for the reader. 2. It breaks a 'classical' operator short-circuit: F && x -> E or F (classically, F) 3. It breaks the intuitive understanding of disjunction (cf. TBL's comment): F || x -> T or F (classically, x) === In SQL, the typing errors are evaluated differently (cf. http://www.oreilly.com/catalog/orsqlter/chapter/ch01.html#42643): A | B | A && B | A || B | !A | ! (A && B) | (!A) || (!B) ---+---+--------+--------+----+------------+----------------- T | E | E | T | F | E | E* F | E | F* | E* | T | T* | T E | T | E | T | E | E | E* E | F | F* | E* | E | T* | T E | E | E | E | E | E | E I've marked the differences with the current LC with an asterisk. As you can see, this system is more consistent: it conforms to De Morgan's Laws and does not break any classical short-circuits, nor does it violate the intuitive understanding of disjunction. So how does this affect querying? Let's look again at Tim's example scenario. There was no smoke, and the temparature is uncomparable to 40. The question he wants to ask is if there is any smoke or if the temperature is above 40. In RDF: data: :detector :temperature "garbled"^^my:UnknownDT . :detector :smoke 0. query: SELECT ?smoke ?temp WHERE { :foo :smoke ?smoke ; :temperature ?temp FILTER (?smoke = 1 || ?temp > 40). } result using LC truth table: ( F || E -> F ): no results. result using SQL truth table: ( F || E -> E): no results. At first, no difference. (arguably in the SQL scenario one should give back an error of some kind, I'll get back to that later). However, the difference becomes apparent if we slightly alter the query by using a NOT: SELECT ?smoke ?temp WHERE { :foo :smoke ?smoke ; :temperature ?temp FILTER (not(?smoke = 1 || ?temp <= 40)). } result using LC truth table: (not(F || E) -> not(F) -> T ) ?smoke ?temp 0 "garbled"^^my:UnknownDT result using SQL truth table: (not(F || E) -> not(E) -> E ) no results. Now, clearly what we see here is that because de Morgan is violated, we get very counter-intuitive results: a query that seems to be the logical equivalent of the orginal gives different results. I think this is highly undesirable. In the above I have tacitly assumed that the eventual result of a FILTER condition, if it is a typing error, is equivalent to FALSE: we simply give no result. Of course, it would be possible to design a query engine such that if this occurred, it raises an error, either in the result or in some other fashion. However, I would expect a SPARQL engine to continue evaluating; it seems to me to be unnecessarily restrictive to require that on such errors, evaluation halts completely. In this respect RDF is significantly different from SQL in that I expect it to be much more common that a set of values is heterogeneous, and that datatyped comparisons will relatively often produce type errors. Skipping such type errors and continuing with possible valid comparisons seems a much more graceful solution to me. Jeen
Received on Wednesday, 7 September 2005 15:14:25 UTC