!=, NOT IN and type errors

Dear WG,

The current definition of NOT IN is in terms of the != operator. 
Essentially it states that

	x NOT IN (a, b, c)

is equivalent to

	(x != a && x != b && x != c)

However, a consequence of this is that its usefulness is severely 
limited in cases where a variable may be bound to literals of various 
(known or unknown) datatypes.

Since != is defined as throwing a type error when comparing two literals 
with incompatible datatypes (e.g., when comparing a string-typed literal 
with an integer-typed literal), and FILTER is defined to evaluate to 
FALSE when a type error occurs, the following (SPARQL 1.0) query will by 
definition always return an empty result:

  SELECT *
  WHERE {?s ?p ?o. FILTER("foo"^^xsd:string != "10"^^xsd:integer) }

This to me seems unintuitive, but it can be worked around (and in any 
case, this is still SPARQL 1.0).

However, in SPARQL 1.1, NOT IN uses the same definition, meaning that it 
behaves in a rather un-intuitive way. For example, consider the 
following data:

	@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
	@prefix : <http://example.org/> .

	:book1 	a :Book ; :discount "42"^^xsd:integer .
	:book2 	a :Book ; :discount "41"^^xsd:integer .
	:book3 	a :Book ; :discount "700.0"^^xsd:float .
	:book4 	a :Book ; :discount "no discount"^^xsd:string .
	:book5 	a :Book ; :discount "70" .
	:book6 	a :Book ; :discount 50 .
	:book7 	a :Book .

and the following query:

	PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
	PREFIX  :  <http://example.org/>
	SELECT  ?book ?discount
	WHERE {
    		?book a :Book ; :discount ?discount .
    		FILTER(?discount NOT IN ("no discount"^^xsd:string, 50, 42))
	}

If unaware of the restrictive behavior wrt datatypes, one would expect 
the result to be:

	?book 	?discount
	-----------------
	:book2	"41"^^xsd:integer
	:book3	"700.0"^^xsd:float
	:book5	"70"

However, the actual result under current definitions is:

	?book 	?discount
	-----------------

That is, empty: since the argument list of the NOT IN contains various 
literals with different incompatible datatypes (two ints and a string), 
a type error occurs on _every_ literal compared with it, by definition, 
and the result of any type error logically-ANDed with another value is 
FALSE (per the table in section 17.2, Filter Evaluation). Thus, if any 
of item in the list causes a type error, the NOT IN operator fails.

What I hope to illustrate is that the current definition of != not only 
severely limits the practical usefulness of NOT IN, but also that it 
behaves in ways that will be quite hard for users of SPARQL to understand.

I realize that the definition of != is fixed in SPARQL 1.0, so I am not 
quite sure if changing that definition falls within the charter, but I 
would ask the working group to consider this as a design flaw and to 
update, if possible, the definition of NOT IN to be more lenient with 
respect to incompatible datatypes. One way to do this is perhaps to 
define NOT IN using sameTerm rather than !=.

In short, I would appreciate a response from the Working Group 
indicating whether they concur that the current behavior of NOT IN is 
un-intuitive and of limited usefulness, and whether they think 
redefining NOT IN (and IN) using sameTerm is a workable solution.

By the way, I note that in the definition of NOT IN in the Editor's 
Draft, it says:

  "Errors in comparsions [sic] cause the NOT IN expression to raise an 
error if the RDF term being tested is not found to be in the list 
elsewhere in the list of terms."

This statement seems to me incompatible with the definition in terms of 
&& and !=, however, so I am unsure how to interpret this.

Regards,

Jeen

Received on Thursday, 17 March 2011 02:53:54 UTC