RE: Mulgara and sameTerm from Seaborne, Andy on 2008-07-29 (public-sparql-dev@w3.org from July to September 2008)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 29 Jul 2008 16:33:43 +0000
To: "public-sparql-dev@w3.org" <public-sparql-dev@w3.org>
CC: Arjohn Kampman <arjohn@aduna-software.com>, Andrae Muys <andrae@netymon.com>, Paul Gearon <gearon@ieee.org>, James Leigh <james-nospam@leighnet.ca>
Message-ID: <B6CF1054FDC8B845BF93A6645D19BEA345D535088E@GVW1118EXC.americas.hpqcorp.net>
Hi all,

I hope you don't mind but this is the long-winded answer to put everything in context.

SPARQL is defined for simple entailment and there is a extension mechanism for other entailment regimes [A].  An entailment regime can add some level of D-entailment [C] which allows matching of graphs to be performed with respect to value space not just the lexical space.  The XSD datatype hierarchy [E] for D-entailment being a very important case and is specially treated in FILTERs.

Most of the SPARQL filters require value space comparison.  The definition of "=" allows extensibility by causing a type error if two terms might be the same value but the processor does not know.  (Aside two literals are definitely equal if they are the same lexical form and same datatype, for any datatype whether anything else if know to the processor about it, because the lexical to value space mapping of the datatype is functional.)

sameTerm works on the definition of equality from RDF Concepts so no D-entailment. [B]  But SPARQL does not prescribe what is "in" the store - there is dataset that is queried.  Especially in the case where the dataset comes from execution context (no FROM etc, no protocol parameter), SPARQL says nothing about how that dataset came to be.  It just is.  So if you load RDF that has "+1"^^xsd:int, whether the store preserves the exact lexical form, or it's datatype, is a feature of the store.  SPARQL does not cover this step.  If you load "+1"^^xsd:integer and "01"^^xsd:byte, it's a store decision whether there are two terms or one, or whether what is stored and returned is "1"^^xsd:integer which wasn't directly mentioned (or even "1"^^xsd:decimal as the primitive XSD type that they are all derived from).

Whether this all happens when the data is actually loaded, some intermediate time or even under the covers at query time is merely implementation detail.  It just might not be an easy implementation detail for store builders :-)

Different user classes want different things.  If your ontology editing, what is stored being exactly the terms specified is an expectation.  But plain literals with no language tag and xsd:strings are same-value (RDF MT rules XSD 1a, and XSD 1b).  We see both expectations - preserve exact form and equate such plain literal and xsd:strings even within the ontology editing users.

And double and floats aren't even derived type related but they are datatypes the FILTER system requires to be understood.  Their equality is comes XSD F&O [D] op:numeric-equal.  So whether basic graph pattern matching (generative, joining) and FILTER value testing (restrictive) exactly line up depends on (D-) entailment provided.


The test suite is a slightly different case: it is providing tests for a specific set of choices.  The tests do label what the assumptions are.  Some tests are labelled as making more than just basic assumptions (e.g language tags).


And the short answer: SPARQL does not spec out the whole lifecycle and it seems OK to me - which is good, because that's what we do for TDB.

        Andy

[A] http://www.w3.org/TR/rdf-sparql-query/#sparqlBGPExtend

[B] http://www.w3.org/TR/rdf-sparql-query/#func-RDFterm-equal

[C] http://www.w3.org/TR/rdf-mt/#D_entailment

[D] http://www.w3.org/TR/xpath-functions/#func-numeric-equal

[E] http://www.w3.org/TR/xmlschema-2/#built-in-datatypes


> -----Original Message-----
> From: James Leigh [mailto:james-nospam@leighnet.ca]
> Sent: 29 July 2008 16:50
> To: public-sparql-dev@w3.org
> Cc: Seaborne, Andy; Arjohn Kampman; Andrae Muys; Paul Gearon
> Subject: Re: Mulgara and sameTerm
>
> On Tue, 2008-07-29 at 10:44 -0500, Paul Gearon wrote:
> > Because I was being asked to make this "work" with the SPARQL test
> > suite, I presumed that duplication was required. I also presumed that
> > most applications inserting a non-canonical form of data would stick
> > to the same lexical form each time, which would minimize the issue for
> > that application.
> >
> > Of course, it is always possible to take the easy road and rely on
> > RDF-equals. So instead of using:
> >   ns:foo ns:bar ?x . ?x ns:baz ns:boo
> >
> > You'd instead use:
> >   ns:foo ns:bar ?x . ?y ns:baz ns:boo FILTER (?x = ?y)
> >
> > However, this is never going to perform as well, and can potentially
> > take up significantly more storage, so I'm not for it at all.
> >
> If this brakes SPARQL compatibility, would you be against full SPARQL
> compatibility in Mulgara?
>
> > I'm OK to move this thread onto the SPARQL list.
> >
> > Paul
> >
> > On Tue, Jul 29, 2008 at 10:28 AM, Seaborne, Andy <andy.seaborne@hp.com>
> wrote:
> > > Does anyone mind if this discussion happens on public-sparql-
> dev@w3.org?
> > >
> > >        Andy
> > >
> > >> -----Original Message-----
> > >> From: James Leigh [mailto:james@leighnet.ca]
> > >> Sent: 29 July 2008 13:21
> > >> To: Arjohn Kampman; Seaborne@domain.invalid; Seaborne, Andy
> > >> Cc: Paul Gearon; Andrae Muys
> > >> Subject: Re: Mulgara and sameTerm
> > >>
> > >> Hi all,
> > >>
> > >> Including Andy to get his interpretation (read on down the page for
> more
> > >> information).
> > >>
> > >> I spoke with Andrae (he is having email troubles). He thought this
> was a
> > >> very serious problem and wanted to take this up with Andy Seaborne.
> > >>
> > >> His concerns where:
> > >> The problem is that this would prevent us from ever storing nodes
> > >> inline; forcing a string-pool lookup on *every* resolution.
> > >> What should be the result of joining "1"^^xsd:int and "+1"^^xsd:int ?
> > >> Will this mean that they will have different localnodes?
> > >>
> > >> Paul what is your take on these concerns/questions?
> > >>
> > >> I think "1"^^xsd:int should be a different term then "+1"^^xsd:int
> and
> > >> have different localnodes.
> > >>
> > >> Maybe we could introduce new internal types, instead of just integer,
> we
> > >> could have integer and integer-with-plus-prefix and others to handle
> all
> > >> possible numeric formats?


There are more cases than this: anything that is a derived type is in the same value space of it's primitive type.

All these are the same value by XSD (Schema Part 2: Datatypes)
(The XSD decimal derived types are the most extensive)

Variations on lexical form:

"1"^^xsd:integer
"01"^^xsd:integer
"+1"^^xsd:integer

Derived types:

"1"^^xsd:nonNegativeInteger
"1"^^xsd:positiveInteger
"1"^^xsd:unsignedLong

"1"^^xsd:long
"1"^^xsd:int
"1"^^xsd:short
"1"^^xsd:byte

There would be quite a lot of different internal types.


> > >>
> > >> James
> > >>
> > >> On Mon, 2008-07-28 at 13:09 -0400, James Leigh wrote:
> > >> > Hi Arjohn, Paul and Andrae,
> > >> >
> > >> > Mulgara 2.0 was released last week. It includes some of the bugs
> that
> > >> > were discovered through the Sesame SPARQL test-suite. However,
> there are
> > >> > a few core issues that will prevent us from releasing a stable
> SPARQL
> > >> > compliant RDF store using Mulgara.
> > >> >
> > >> > The biggest problem is that Mulgara stores only the literal _value_
> for
> > >> > known datatypes. That means that "+1"^^xsd:int is stored identical
> to
> > >> > "1"^^xsd:int. This has significant consequences with how we
> implement
> > >> > sameTerm as these literals originally have different labels, but
> are
> > >> > collapsed into the same label.
> > >> >
> > >> > RDF Concepts states that for two literals to be the same "The
> strings
> > >> > of the two lexical forms compare equal, character by character."
> (see
> > >> > below for more context). Mulgara will have to begin storing the
> original
> > >> > label with all literals (at least for unreproducible labels) before
> we
> > >> > can release a stable SPARQL compliant RDF store.
> > >> >
> > >> >  ** Paul/Andrae, can this change be put into the Mulgara road-map?
> **
> > >> >
> > >> > Thanks,
> > >> > James
> > >> >
> > >> > ---%<---
> > >> > The SPARQL sameTerm states that[1]:
> > >> >         Returns TRUE if term1 and term2 are the same RDF term as
> defined
> > >> >         in Resource Description Framework (RDF): Concepts and
> Abstract
> > >> >         Syntax [CONCEPTS]; returns FALSE otherwise.
> > >> >
> > >> > Here is a excerpt from RDF Concepts[2]:
> > >> >         6.5.1 Literal Equality
> > >> >         Two literals are equal if and only if all of the following
> hold:
> > >> >
> > >> >               * The strings of the two lexical forms compare equal,
> > >> >                 character by character.
> > >> >               * Either both or neither have language tags.
> > >> >               * The language tags, if any, compare equal.
> > >> >               * Either both or neither have datatype URIs.
> > >> >               * The two datatype URIs, if any, compare equal,
> character
> > >> >                 by character.
> > >> >
> > >> > [1] http://www.w3.org/TR/rdf-sparql-query/#func-sameTerm

> > >> > [2] http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal

> > >> >
> > >
> > >
Received on Tuesday, 29 July 2008 16:35:18 UTC