W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2010

Re: D-enatilment and canonicalization

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 05 Mar 2010 09:23:53 -0500
To: Andy Seaborne <andy.seaborne@talis.com>
cc: "Polleres\, Axel" <axel.polleres@deri.org>, ivan@w3.org, public-rdf-dawg@w3.org
Message-ID: <10504.1267799033@waldron>

> The SPARQL query really starts where the data is already loaded (FROM 
> etc not withstanding) so the data as it is loaded may be prepared in 
> some fashion outside the SPARQL spec.

But that's no longer true when we have update, is it?

My (woefully under-researched, I'm sorry) sense of SPARQL has always
been that it forces systems to keep a lot of datatype information that
they might not really want to keep.

For example, I thought SPARQL made me keep xs:strings distinct from RDF
plain literals (without language tags), even though the value spaces are
the same.  This is a huge implementation burden, which I'm trying to
sort through in RIF right now.

I would love to hear that SPARQL does NOT mind if I just store strings
internally, and somehow hide from users whether they came in as
xs:strings or as plain literals.  I expect the same applies even more
pointedly to "1"^^xs:int vs "1"xs:integer.  Clearly the same value, but
a different graph node.  IMHO SPARQL should make it clear that when you
put one in, you might get the other out.

     -- Sandro

> When we discussed this last time, we recognized that systems already did 
> work on loading RDF and did not introduce any text to obstruct them.
> 
> As to whether it's an "entailment regime", if it is then it's finite and 
> different for each system.  It is done when data is loaded not queried 
> (think running rules over the data).
> 
> 
> For example, TDB canonicalizes integers between -2^55 and +2^55-1 but 
> not outside that range (they have their original lexical form stored). 
> Decimals have 48 bits of precision and 8 bits of scale and again if 
> outside the that range, the normal node storage is used and the lexical 
> form is not canonicalised.
> 
> Derived integer types are promoted to integer.
> 
> (This in TDB is all "currently" and planned to change a little).
> 
> 	Andy
> 
> On 05/03/2010 9:29 AM, Polleres, Axel wrote:
> > Thanks andy, my (maybe naïve) question would then be: is behavior 2 warrante
> d "as is" by the current spec, or is "canonical datatype representation" actu
> ally another (commonly used already) "entailment regime" that should be defin
> ed as such?
> >
> > Best,
> > Axel
> >
> > ----- Original Message -----
> > From: Andy Seaborne<afs@talisplatform.com>
> > To: Polleres, Axel
> > Cc: ivan@w3.org<ivan@w3.org>; public-rdf-dawg@w3.org<public-rdf-dawg@w3.org
> >
> > Sent: Fri Mar 05 09:06:09 2010
> > Subject: D-enatilment and canonicalization
> >
> >
> >
> > On 05/03/2010 8:45 AM, Polleres, Axel wrote:
> >> In my opinion this is a question concerning all entailments from D-entailm
> ent "upwards".
> >>
> >> ----- Original Message -----
> >> From: Ivan Herman<ivan@w3.org>
> >> To: Polleres, Axel
> >> Cc: Birte Glimm<birte.glimm@comlab.ox.ac.uk>; SPARQL Working Group<public-
> rdf-dawg@w3.org>
> >> Sent: Fri Mar 05 08:08:10 2010
> >> Subject: Re: [TF-ENT] Condition C2 modifications
> >>
> >>
> >>
> >> On 2010-3-5 24:36 , Axel Polleres wrote:
> >>>
> >>> No objections, but one additional side question:
> >>>
> >>> Do we have an issue with systems that use canonical forms of datatype lit
> erals internally?
> >>>
> >>> Say you have:
> >>>
> >>>    :s :p "1.000"^^xsd:decimal
> >>>
> >>> is a Datatype-aware system really supposed to return
> >>>
> >>>    "1.000"^^xsd:decimal
> >>>
> >>> on { :s :p ?O}
> >>>
> >>> but not it's internal representation?
> >>>
> >>>
> >>
> >> This is a good question, I do not know the answer:-(, but is this an
> >> entailment specific question? I would expect that to be a question for
> >> SPARQL as a whole...
> >>
> >> Cheers
> >>
> >> Ivan
> >
> > There are 2 cases for value aware systems and there are examples of
> > systems in each case:
> >
> > 1/ Data "1.00"^^xsd:decimal,
> >      stores "1.00"^^xsd:decimal,
> >      matches "1.0"^^xsd:decimal,
> >      matches "1.00"^^xsd:decimal,
> >      returns "1.00"^^xsd:decimal
> >
> > i.e. the original term is stored and returned
> >
> > 2/ Data "1.00"^^xsd:decimal,
> >      stores "1.0"^^xsd:decimal,
> >      matches "1.0"^^xsd:decimal
> >      matches "1.00"^^xsd:decimal (canonicialization applied)
> >      returns "1.0"^^xsd:decimal
> >
> > i.e. the canonicalized term is stored and returned
> >
> >
> > See also "1"^^xsd:byte and "1"^^xsd:integer
> >
> > I avoided describing them as D-entailment because that really is a set
> > of possibilities depending on the datatypes supported and ranges of
> > values within the datatypes.  They don't necessarily force D-consistency.
> >
> > 	Andy
> >
> > Examples:
> > 1 - Jena memory model
> > 2 - Jena TDB
> >
> > ______________________________________________________________________
> > This email has been scanned by the MessageLabs Email Security System.
> > For more information please visit http://www.messagelabs.com/email
> > ______________________________________________________________________
> 
Received on Friday, 5 March 2010 14:23:56 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:41 GMT