Re: Mulgara and sameTerm from Paul Gearon on 2008-07-29 (public-sparql-dev@w3.org from July to September 2008)

From: Paul Gearon <gearon@ieee.org>
Date: Tue, 29 Jul 2008 12:05:10 -0500
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: "public-sparql-dev@w3.org" <public-sparql-dev@w3.org>, "Arjohn Kampman" <arjohn@aduna-software.com>, "Andrae Muys" <andrae@netymon.com>, "James Leigh" <james-nospam@leighnet.ca>
Message-ID: <a25ac1f0807291005n61247df1j1ae5e749e18f95f5@mail.gmail.com>

Thanks Andy, this does clear up a number of things for me.

On Tue, Jul 29, 2008 at 11:33 AM, Seaborne, Andy <andy.seaborne@hp.com>
wrote:
<snip/>

Most of the SPARQL filters require value space comparison.  The definition
of "=" allows extensibility by causing a type error if two terms might be
the same value but the processor does not know.  (Aside two literals are
definitely equal if they are the same lexical form and same datatype, for
any datatype whether anything else if know to the processor about it,
because the lexical to value space mapping of the datatype is functional.)


This reminds me... exactly what is meant by "type error" here? The first
time I worked on this I threw an exception, but obviously that wasn't a good
idea and I fixed it.  :-)  At the moment, a "type error" is effectively the
same as not equals, which works, but has me uncomfortable since I'm ignoring
the distinction. (Actually, I'm still using the exception internally, but I
catch it and continue as if there was no match)

sameTerm works on the definition of equality from RDF Concepts so no
D-entailment. [B]  But SPARQL does not prescribe what is "in" the store -
there is dataset that is queried.  Especially in the case where the dataset
comes from execution context (no FROM etc, no protocol parameter), SPARQL
says nothing about how that dataset came to be.  It just is.  So if you load
RDF that has "+1"^^xsd:int, whether the store preserves the exact lexical
form, or it's datatype, is a feature of the store.  SPARQL does not cover
this step.  If you load "+1"^^xsd:integer and "01"^^xsd:byte, it's a store
decision whether there are two terms or one, or whether what is stored and
returned is "1"^^xsd:integer which wasn't directly mentioned (or even
"1"^^xsd:decimal as the primitive XSD type that they are all derived from).


This was my understanding of how things work, though this implementation
decision for Mulgara was made by others. I'm glad to see that the decision
wasn't based on a misunderstanding. However, it *is* causing problems for
the test suite.... as you get to below.

<snip/>

The test suite is a slightly different case: it is providing tests for a
specific set of choices.  The tests do label what the assumptions are.  Some
tests are labelled as making more than just basic assumptions (e.g language
tags).


This is where we are coming unstuck. The tests are being treated as an
absolute, meaning that if we don't get exact correspondence in the results
we fail. Even if Mulgara is prepared to accept this, many potential users
are not. In our current scenario, Sesame is expecting exact compliance with
the tests as they are, and our current architecture (storing values for
known types, rather than lexical representations) does not work here.

I guess our problem comes down to the test suite being treated as a de facto
part of the standard.

Paul

Received on Tuesday, 29 July 2008 17:06:00 UTC