Re: !=, NOT IN and type errors

Jeen,

Thank you for the report.

SPARQL expressions use a 3-state logic: true, false and unknown. Unknown 
is the error-raised case. To make sure a processor only returns things 
known to be true, FILTER turns "unknown" into false.

For the expression;

"foo"^^xsd:string != "10"^^xsd:integer

then a minimal SPARQL 1.0 query processor is not required to know that 
the value spaces of xsd:string and xsd:integer are disjoint. For a 
processor that does not know this, the result of the comparison is 
"unknown" and an error is raised. If the processor did know they are 
disjoint, then it can return "true" -- such behavior would be adding a 
row to the dispatch table for functions for "xsd:string != xsd:integer".

NOT IN is value based and builds on this. If the processor understands 
that xsd:string and the numeric value spaces are disjoint, then it would 
return:

?book                           ?discount
<http://example.org/book5>	"70"
<http://example.org/book3>	"700.0"^^xsd:float
<http://example.org/book2>	41

which is what I think you are expecting. It's the additional fact of 
xsd:string and xsd:integer having disjoint value spaces that is key here.

Being value based:

"700"^^xsd:float IN (500, 600, 700) is true.

Using sameTerm it would be false which might be surprising.

We would be grateful if you would acknowledge that your comment has been 
answered by sending a reply to this mailing list.

Andy, on behalf of the SPARQL-WG

On 17/03/11 02:53, Jeen Broekstra wrote:
>
> Dear WG,
>
> The current definition of NOT IN is in terms of the != operator.
> Essentially it states that
>
> x NOT IN (a, b, c)
>
> is equivalent to
>
> (x != a && x != b && x != c)
>
> However, a consequence of this is that its usefulness is severely
> limited in cases where a variable may be bound to literals of various
> (known or unknown) datatypes.
>
> Since != is defined as throwing a type error when comparing two literals
> with incompatible datatypes (e.g., when comparing a string-typed literal
> with an integer-typed literal), and FILTER is defined to evaluate to
> FALSE when a type error occurs, the following (SPARQL 1.0) query will by
> definition always return an empty result:
>
> SELECT *
> WHERE {?s ?p ?o. FILTER("foo"^^xsd:string != "10"^^xsd:integer) }
>
> This to me seems unintuitive, but it can be worked around (and in any
> case, this is still SPARQL 1.0).
>
> However, in SPARQL 1.1, NOT IN uses the same definition, meaning that it
> behaves in a rather un-intuitive way. For example, consider the
> following data:
>
> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
> @prefix : <http://example.org/> .
>
> :book1 a :Book ; :discount "42"^^xsd:integer .
> :book2 a :Book ; :discount "41"^^xsd:integer .
> :book3 a :Book ; :discount "700.0"^^xsd:float .
> :book4 a :Book ; :discount "no discount"^^xsd:string .
> :book5 a :Book ; :discount "70" .
> :book6 a :Book ; :discount 50 .
> :book7 a :Book .
>
> and the following query:
>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> PREFIX : <http://example.org/>
> SELECT ?book ?discount
> WHERE {
> ?book a :Book ; :discount ?discount .
> FILTER(?discount NOT IN ("no discount"^^xsd:string, 50, 42))
> }
>
> If unaware of the restrictive behavior wrt datatypes, one would expect
> the result to be:
>
> ?book ?discount
> -----------------
> :book2 "41"^^xsd:integer
> :book3 "700.0"^^xsd:float
> :book5 "70"
>
> However, the actual result under current definitions is:
>
> ?book ?discount
> -----------------
>
> That is, empty: since the argument list of the NOT IN contains various
> literals with different incompatible datatypes (two ints and a string),
> a type error occurs on _every_ literal compared with it, by definition,
> and the result of any type error logically-ANDed with another value is
> FALSE (per the table in section 17.2, Filter Evaluation). Thus, if any
> of item in the list causes a type error, the NOT IN operator fails.
>
> What I hope to illustrate is that the current definition of != not only
> severely limits the practical usefulness of NOT IN, but also that it
> behaves in ways that will be quite hard for users of SPARQL to understand.
>
> I realize that the definition of != is fixed in SPARQL 1.0, so I am not
> quite sure if changing that definition falls within the charter, but I
> would ask the working group to consider this as a design flaw and to
> update, if possible, the definition of NOT IN to be more lenient with
> respect to incompatible datatypes. One way to do this is perhaps to
> define NOT IN using sameTerm rather than !=.
>
> In short, I would appreciate a response from the Working Group
> indicating whether they concur that the current behavior of NOT IN is
> un-intuitive and of limited usefulness, and whether they think
> redefining NOT IN (and IN) using sameTerm is a workable solution.
>
> By the way, I note that in the definition of NOT IN in the Editor's
> Draft, it says:
>
> "Errors in comparsions [sic] cause the NOT IN expression to raise an
> error if the RDF term being tested is not found to be in the list
> elsewhere in the list of terms."
>
> This statement seems to me incompatible with the definition in terms of
> && and !=, however, so I am unsure how to interpret this.
>
> Regards,
>
> Jeen
>
>
>

Received on Friday, 25 March 2011 13:02:59 UTC