- From: Jeen Broekstra <jeen.broekstra@gmail.com>
- Date: Thu, 17 Mar 2011 15:53:16 +1300
- To: public-rdf-dawg-comments@w3.org
Dear WG, The current definition of NOT IN is in terms of the != operator. Essentially it states that x NOT IN (a, b, c) is equivalent to (x != a && x != b && x != c) However, a consequence of this is that its usefulness is severely limited in cases where a variable may be bound to literals of various (known or unknown) datatypes. Since != is defined as throwing a type error when comparing two literals with incompatible datatypes (e.g., when comparing a string-typed literal with an integer-typed literal), and FILTER is defined to evaluate to FALSE when a type error occurs, the following (SPARQL 1.0) query will by definition always return an empty result: SELECT * WHERE {?s ?p ?o. FILTER("foo"^^xsd:string != "10"^^xsd:integer) } This to me seems unintuitive, but it can be worked around (and in any case, this is still SPARQL 1.0). However, in SPARQL 1.1, NOT IN uses the same definition, meaning that it behaves in a rather un-intuitive way. For example, consider the following data: @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix : <http://example.org/> . :book1 a :Book ; :discount "42"^^xsd:integer . :book2 a :Book ; :discount "41"^^xsd:integer . :book3 a :Book ; :discount "700.0"^^xsd:float . :book4 a :Book ; :discount "no discount"^^xsd:string . :book5 a :Book ; :discount "70" . :book6 a :Book ; :discount 50 . :book7 a :Book . and the following query: PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX : <http://example.org/> SELECT ?book ?discount WHERE { ?book a :Book ; :discount ?discount . FILTER(?discount NOT IN ("no discount"^^xsd:string, 50, 42)) } If unaware of the restrictive behavior wrt datatypes, one would expect the result to be: ?book ?discount ----------------- :book2 "41"^^xsd:integer :book3 "700.0"^^xsd:float :book5 "70" However, the actual result under current definitions is: ?book ?discount ----------------- That is, empty: since the argument list of the NOT IN contains various literals with different incompatible datatypes (two ints and a string), a type error occurs on _every_ literal compared with it, by definition, and the result of any type error logically-ANDed with another value is FALSE (per the table in section 17.2, Filter Evaluation). Thus, if any of item in the list causes a type error, the NOT IN operator fails. What I hope to illustrate is that the current definition of != not only severely limits the practical usefulness of NOT IN, but also that it behaves in ways that will be quite hard for users of SPARQL to understand. I realize that the definition of != is fixed in SPARQL 1.0, so I am not quite sure if changing that definition falls within the charter, but I would ask the working group to consider this as a design flaw and to update, if possible, the definition of NOT IN to be more lenient with respect to incompatible datatypes. One way to do this is perhaps to define NOT IN using sameTerm rather than !=. In short, I would appreciate a response from the Working Group indicating whether they concur that the current behavior of NOT IN is un-intuitive and of limited usefulness, and whether they think redefining NOT IN (and IN) using sameTerm is a workable solution. By the way, I note that in the definition of NOT IN in the Editor's Draft, it says: "Errors in comparsions [sic] cause the NOT IN expression to raise an error if the RDF term being tested is not found to be in the list elsewhere in the list of terms." This statement seems to me incompatible with the definition in terms of && and !=, however, so I am unsure how to interpret this. Regards, Jeen
Received on Thursday, 17 March 2011 02:53:54 UTC