Question concerning typed literals in SPARQL from Jeremy Carroll on 2005-11-30 (public-rdf-dawg@w3.org from October to December 2005)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Wed, 30 Nov 2005 15:58:17 +0000
To: public-rdf-dawg@w3.org
CC: SWBPD <public-swbp-wg@w3.org>
Message-ID: <438DCC19.2080607@hpl.hp.com>
This question concerns your document:
http://www.w3.org/TR/2005/WD-rdf-sparql-query-20051123/

In SWBPD WG, we have been discussing the semantics of typed literals.

In particular, we are trying to decide between the three possibilities 
outlined in:

http://www.w3.org/TR/2005/WD-swbp-xsch-datatypes-20050427/#sec-values

The third of these (True Values) has not received any support.

The second solution, based around XPath eq, is motivated to try and give 
a smoother experience to end users who may find data for which the 
choices between say xsd:double and xsd:decimal have not been consistent.

Advocates of the first solution (Primitive Equality), which treats 
xsd:decimal and xsd:double as disjoint, have argued that the same end 
user functionality can be achieved by combining the first solution with 
SPARQL.

The purpose of this e-mail is to confirm that line of argument with you.


In this first solution (Primitive Equality) equality of typed literals 
is determined by comparing literals using their primitive base type, and 
treating all primitive base types as different.
In this
"1.3"^^xsd:float
"1.3"^^xsd:double
"1.3"^^xsd:decimal
"1"^^xsd:float
"1"^^xsd:double
"1"^^xsd:decimal
all have different values.

My understanding is that SPARQL does not specify whether the store being 
queried is required or not to treat two literals with the same value but 
different syntactic form as the same or different.
If we have two stores A and B where A compares literals syntactically, 
but B compares literal by value, and the value comparisons are done with 
the Primitive Equality semantics described as above, then my 
understanding is that the following results would hold.

If the following triples are loaded into both A and B

<eg:decimal> <eg:p> "1.3"^^xsd:decimal .
<eg:float> <eg:p> "1.3"^^xsd:float .
<eg:double> <eg:p> "1.3"^^xsd:double .
<eg:decimal2> <eg:p> "1.300"^^xsd:decimal .

Then:

SELECT  ?s, ?p
WHERE   { ?s, ?p, 1.3 } .

would match

<eg:decimal> <eg:p> "1.3"^^xsd:decimal .
in A

and


<eg:decimal> <eg:p> "1.3"^^xsd:decimal .
<eg:decimal2> <eg:p> "1.300"^^xsd:decimal .
in B

Whereas:

SELECT  ?s, ?p
WHERE   { ?s, ?p, ?o .
            FILTER (?size = 1.3) . } .

would match all four triples for both A and B, since = is interpreted as 
in fn:numeric-equals() and type promotions apply to give equality in all 
cases.

However,

SELECT  ?s, ?p
WHERE   { ?s, ?p, ?o .
            FILTER (?size = 1.3e0) . } .

would match the following triples

<eg:decimal> <eg:p> "1.3"^^xsd:decimal .
<eg:double> <eg:p> "1.3"^^xsd:double .
<eg:decimal2> <eg:p> "1.300"^^xsd:decimal .

because the numeric rules would cast "1.3"^^xsd:float to the nearest 
double, which is not "1.3"^^xsd:double.

If an application wanted to explicitly do the equality with floating 
point precision (rather than double precision), I understand the 
following query could be used:

SELECT  ?s, ?p
WHERE   { ?s, ?p, ?o .
            FILTER (xsd:float(?size) = xsd:float(1.3) ) . } .

using explicit casts.
This would return all four triples.

Please indicate whether these examples are correct.


thanks

Jeremy Carroll

PS I am arguing in the SWBPD WG, that since SPARQL adequately addresses 
the needs to make looser comparisons of the sorts above, where float and 
decimal and doubles are treated equivalently, then the next version of


http://www.w3.org/TR/2005/WD-swbp-xsch-datatypes-20050427/

should be presenting primitive equality as the preferred semantics, and 
any further equivalences required by an application to be ones for the 
application to determine, for example, by use of queries such as those 
given here.

PPS Note I am pleased to see the greater clarity in your latest WD 
concerning the type of '1.3' in SPARQL. I found it hard to tell in the 
earlier draft which datatype was intended. Personally I have no opinion 
as to which datatype is better, but I support the "in progress" change 
highlighted at the beginning of section 3 from an editorial point of view.
Received on Wednesday, 30 November 2005 16:01:16 UTC