- From: Jeen Broekstra <jeen.broekstra@gmail.com>
- Date: Thu, 17 Mar 2011 15:53:16 +1300
- To: public-rdf-dawg-comments@w3.org
Dear WG,
The current definition of NOT IN is in terms of the != operator.
Essentially it states that
x NOT IN (a, b, c)
is equivalent to
(x != a && x != b && x != c)
However, a consequence of this is that its usefulness is severely
limited in cases where a variable may be bound to literals of various
(known or unknown) datatypes.
Since != is defined as throwing a type error when comparing two literals
with incompatible datatypes (e.g., when comparing a string-typed literal
with an integer-typed literal), and FILTER is defined to evaluate to
FALSE when a type error occurs, the following (SPARQL 1.0) query will by
definition always return an empty result:
SELECT *
WHERE {?s ?p ?o. FILTER("foo"^^xsd:string != "10"^^xsd:integer) }
This to me seems unintuitive, but it can be worked around (and in any
case, this is still SPARQL 1.0).
However, in SPARQL 1.1, NOT IN uses the same definition, meaning that it
behaves in a rather un-intuitive way. For example, consider the
following data:
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix : <http://example.org/> .
:book1 a :Book ; :discount "42"^^xsd:integer .
:book2 a :Book ; :discount "41"^^xsd:integer .
:book3 a :Book ; :discount "700.0"^^xsd:float .
:book4 a :Book ; :discount "no discount"^^xsd:string .
:book5 a :Book ; :discount "70" .
:book6 a :Book ; :discount 50 .
:book7 a :Book .
and the following query:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <http://example.org/>
SELECT ?book ?discount
WHERE {
?book a :Book ; :discount ?discount .
FILTER(?discount NOT IN ("no discount"^^xsd:string, 50, 42))
}
If unaware of the restrictive behavior wrt datatypes, one would expect
the result to be:
?book ?discount
-----------------
:book2 "41"^^xsd:integer
:book3 "700.0"^^xsd:float
:book5 "70"
However, the actual result under current definitions is:
?book ?discount
-----------------
That is, empty: since the argument list of the NOT IN contains various
literals with different incompatible datatypes (two ints and a string),
a type error occurs on _every_ literal compared with it, by definition,
and the result of any type error logically-ANDed with another value is
FALSE (per the table in section 17.2, Filter Evaluation). Thus, if any
of item in the list causes a type error, the NOT IN operator fails.
What I hope to illustrate is that the current definition of != not only
severely limits the practical usefulness of NOT IN, but also that it
behaves in ways that will be quite hard for users of SPARQL to understand.
I realize that the definition of != is fixed in SPARQL 1.0, so I am not
quite sure if changing that definition falls within the charter, but I
would ask the working group to consider this as a design flaw and to
update, if possible, the definition of NOT IN to be more lenient with
respect to incompatible datatypes. One way to do this is perhaps to
define NOT IN using sameTerm rather than !=.
In short, I would appreciate a response from the Working Group
indicating whether they concur that the current behavior of NOT IN is
un-intuitive and of limited usefulness, and whether they think
redefining NOT IN (and IN) using sameTerm is a workable solution.
By the way, I note that in the definition of NOT IN in the Editor's
Draft, it says:
"Errors in comparsions [sic] cause the NOT IN expression to raise an
error if the RDF term being tested is not found to be in the list
elsewhere in the list of terms."
This statement seems to me incompatible with the definition in terms of
&& and !=, however, so I am unsure how to interpret this.
Regards,
Jeen
Received on Thursday, 17 March 2011 02:53:54 UTC