Re: Question concerning typed literals in SPARQL

On Mon, Dec 05, 2005 at 05:48:31AM -0500, Eric Prud'hommeaux wrote:
> On Wed, Nov 30, 2005 at 03:58:17PM +0000, Jeremy Carroll wrote:
> > 
> > 
> > This question concerns your document:
> > http://www.w3.org/TR/2005/WD-rdf-sparql-query-20051123/
> > 
> > In SWBPD WG, we have been discussing the semantics of typed literals.
> > 
> > In particular, we are trying to decide between the three possibilities 
> > outlined in:
> > 
> > http://www.w3.org/TR/2005/WD-swbp-xsch-datatypes-20050427/#sec-values
> > 
> > The third of these (True Values) has not received any support.
> > 
> > The second solution, based around XPath eq, is motivated to try and give 
> > a smoother experience to end users who may find data for which the 
> > choices between say xsd:double and xsd:decimal have not been consistent.
> > 
> > Advocates of the first solution (Primitive Equality), which treats 
> > xsd:decimal and xsd:double as disjoint, have argued that the same end 
> > user functionality can be achieved by combining the first solution with 
> > SPARQL.
> > 
> > The purpose of this e-mail is to confirm that line of argument with you.
> > 
> > 
> > In this first solution (Primitive Equality) equality of typed literals 
> > is determined by comparing literals using their primitive base type, and 
> > treating all primitive base types as different.
> > In this
> > "1.3"^^xsd:float
> > "1.3"^^xsd:double
> > "1.3"^^xsd:decimal
> > "1"^^xsd:float
> > "1"^^xsd:double
> > "1"^^xsd:decimal
> > all have different values.
> > 
> > My understanding is that SPARQL does not specify whether the store being 
> > queried is required or not to treat two literals with the same value but 
> > different syntactic form as the same or different.
> > If we have two stores A and B where A compares literals syntactically, 
> > but B compares literal by value, and the value comparisons are done with 
> > the Primitive Equality semantics described as above, then my 
> > understanding is that the following results would hold.
> 
> For graph matching, B compares literal by value, therefor, B's
> database contains all of the extra triples stemming from the known
> equivilences. SPARQL's graph matching is matched against this larger
> set of triples.
> 
> For FILTERS, the semantics are entirely defined in SPARQL, so no extra
> clever database will give you extra clever FILTERS.
> 
> > If the following triples are loaded into both A and B
> > 
> > <eg:decimal> <eg:p> "1.3"^^xsd:decimal .
> > <eg:float> <eg:p> "1.3"^^xsd:float .
> > <eg:double> <eg:p> "1.3"^^xsd:double .
> > <eg:decimal2> <eg:p> "1.300"^^xsd:decimal .
> > 
> > Then:
> > 
> > SELECT  ?s, ?p
> > WHERE   { ?s, ?p, 1.3 } .
> > 
> > would match
> > 
> > <eg:decimal> <eg:p> "1.3"^^xsd:decimal .
> > in A
> > 
> > and
> > 
> > 
> > <eg:decimal> <eg:p> "1.3"^^xsd:decimal .
> > <eg:decimal2> <eg:p> "1.300"^^xsd:decimal .
> > in B
> 
> Agreed
> 
> > Whereas:
> > 
> > SELECT  ?s, ?p
> > WHERE   { ?s, ?p, ?o .
> >            FILTER (?o = 1.3) . } .
> > 
> > would match all four triples for both A and B, since = is interpreted as 
> > in fn:numeric-equals() and type promotions apply to give equality in all 
> > cases.
> > 
> > However,
> > 
> > SELECT  ?s, ?p
> > WHERE   { ?s, ?p, ?o .
> >            FILTER (?o = 1.3e0) . } .
> > 
> > would match the following triples
> > 
> > <eg:decimal> <eg:p> "1.3"^^xsd:decimal .
> > <eg:double> <eg:p> "1.3"^^xsd:double .
> > <eg:decimal2> <eg:p> "1.300"^^xsd:decimal .
> 
> @FLOAT@
> 
> > 
> > because the numeric rules would cast "1.3"^^xsd:float to the nearest 
> > double, which is not "1.3"^^xsd:double.
> 
> I've been scratching my head about this for a while.
> Is this a last-bit-might-be-wrong problem, or something more related
> to type promotion voodoo?

I've confirmed that Jeremy is describing a precision problem. While my
implementation won't reveal the existence or frequency of it, it does
make sense to me.

> > If an application wanted to explicitly do the equality with floating 
> > point precision (rather than double precision), I understand the 
> > following query could be used:
> > 
> > SELECT  ?s, ?p
> > WHERE   { ?s, ?p, ?o .
> >            FILTER (xsd:float(?o) = xsd:float(1.3) ) . } .
> > 
> > using explicit casts.
> > This would return all four triples.
> > 
> > Please indicate whether these examples are correct.

Correct.

> I don't have a rigorous enforcement of numeric types in my
> implementation (I rely on perl to do all the numeric comparisons).
> However, all this seems consistent except for my not understanding
> the lack of "1.3"^^xsd:float at the @FLOAT@ marker (above).
> 
> > thanks
> > 
> > Jeremy Carroll
> > 
> > PS I am arguing in the SWBPD WG, that since SPARQL adequately addresses 
> > the needs to make looser comparisons of the sorts above, where float and 
> > decimal and doubles are treated equivalently, then the next version of
> > 
> > 
> > http://www.w3.org/TR/2005/WD-swbp-xsch-datatypes-20050427/
> > 
> > should be presenting primitive equality as the preferred semantics, and 
> > any further equivalences required by an application to be ones for the 
> > application to determine, for example, by use of queries such as those 
> > given here.
> > 
> > PPS Note I am pleased to see the greater clarity in your latest WD 
> > concerning the type of '1.3' in SPARQL. I found it hard to tell in the 
> > earlier draft which datatype was intended. Personally I have no opinion 
> > as to which datatype is better, but I support the "in progress" change 
> > highlighted at the beginning of section 3 from an editorial point of view.
> 
> Andy, take a bow.



-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Friday, 9 December 2005 00:31:08 UTC