W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2010

Re: D-enatilment and canonicalization

From: Andy Seaborne <andy.seaborne@talis.com>
Date: Fri, 05 Mar 2010 17:05:05 +0000
Message-ID: <4B9139C1.5080605@talis.com>
To: Sandro Hawke <sandro@w3.org>
CC: "Polleres\, Axel" <axel.polleres@deri.org>, ivan@w3.org, public-rdf-dawg@w3.org


On 05/03/2010 2:23 PM, Sandro Hawke wrote:
>
>> The SPARQL query really starts where the data is already loaded (FROM
>> etc not withstanding) so the data as it is loaded may be prepared in
>> some fashion outside the SPARQL spec.
>
> But that's no longer true when we have update, is it?

That's why I said "SPARQL query" - the Q was about the current spec.

> My (woefully under-researched, I'm sorry) sense of SPARQL has always
> been that it forces systems to keep a lot of datatype information that
> they might not really want to keep.

?? Our systems don't keep anything more than a single datatype per RDF 
term.  The Jena memory also keeps a value so has both input lexical form 
and value - good thing literal can only be in the object position.

> For example, I thought SPARQL made me keep xs:strings distinct from RDF
> plain literals (without language tags), even though the value spaces are
> the same.  This is a huge implementation burden, which I'm trying to
> sort through in RIF right now.

No.  That's a pseudo D-entailment.

A bit funny around sameTerm but we scrap around that but the input 
munging and update changes that as you note.

 From experience though, there has different user expectations.  Some 
editor tools, usually ontology ones, take a string input and write 
xsd:string then we get asked why the RDF/XML has xsd:string in it.  When 
writing RDF/XML by hand, people expect simple / xsd:string distinction 
to be maintained.

But numbers are OK - I have not had any question about value-based 
canonicalization in TDB but we have had a lot of questions about 
xsd:string vs simple literals (a SPARQL term Eric came up with because 
we needed it - plain literals without language tags.

> I would love to hear that SPARQL does NOT mind if I just store strings
> internally, and somehow hide from users whether they came in as
> xs:strings or as plain literals.

It's legal by BGP extension if nothing else.

xsd 1a and xsd 1b in RDF MT although here we're not inferring one from 
the other (and hence there are two terms) but working in value-space not 
term-space.

> I expect the same applies even more
> pointedly to "1"^^xs:int vs "1"xs:integer.  Clearly the same value, but
> a different graph node.
> IMHO SPARQL should make it clear that when you
> put one in, you might get the other out.

I very strongly support that position.

	Andy

>
>       -- Sandro
>
>> When we discussed this last time, we recognized that systems already did
>> work on loading RDF and did not introduce any text to obstruct them.
>>
>> As to whether it's an "entailment regime", if it is then it's finite and
>> different for each system.  It is done when data is loaded not queried
>> (think running rules over the data).
>>
>>
>> For example, TDB canonicalizes integers between -2^55 and +2^55-1 but
>> not outside that range (they have their original lexical form stored).
>> Decimals have 48 bits of precision and 8 bits of scale and again if
>> outside the that range, the normal node storage is used and the lexical
>> form is not canonicalised.
>>
>> Derived integer types are promoted to integer.
>>
>> (This in TDB is all "currently" and planned to change a little).
>>
>> 	Andy
>>
>> On 05/03/2010 9:29 AM, Polleres, Axel wrote:
>>> Thanks andy, my (maybe naïve) question would then be: is behavior 2 warrante
>> d "as is" by the current spec, or is "canonical datatype representation" actu
>> ally another (commonly used already) "entailment regime" that should be defin
>> ed as such?
>>>
>>> Best,
>>> Axel
>>>
>>> ----- Original Message -----
>>> From: Andy Seaborne<afs@talisplatform.com>
>>> To: Polleres, Axel
>>> Cc: ivan@w3.org<ivan@w3.org>; public-rdf-dawg@w3.org<public-rdf-dawg@w3.org
>>>
>>> Sent: Fri Mar 05 09:06:09 2010
>>> Subject: D-enatilment and canonicalization
>>>
>>>
>>>
>>> On 05/03/2010 8:45 AM, Polleres, Axel wrote:
>>>> In my opinion this is a question concerning all entailments from D-entailm
>> ent "upwards".
>>>>
>>>> ----- Original Message -----
>>>> From: Ivan Herman<ivan@w3.org>
>>>> To: Polleres, Axel
>>>> Cc: Birte Glimm<birte.glimm@comlab.ox.ac.uk>; SPARQL Working Group<public-
>> rdf-dawg@w3.org>
>>>> Sent: Fri Mar 05 08:08:10 2010
>>>> Subject: Re: [TF-ENT] Condition C2 modifications
>>>>
>>>>
>>>>
>>>> On 2010-3-5 24:36 , Axel Polleres wrote:
>>>>>
>>>>> No objections, but one additional side question:
>>>>>
>>>>> Do we have an issue with systems that use canonical forms of datatype lit
>> erals internally?
>>>>>
>>>>> Say you have:
>>>>>
>>>>>     :s :p "1.000"^^xsd:decimal
>>>>>
>>>>> is a Datatype-aware system really supposed to return
>>>>>
>>>>>     "1.000"^^xsd:decimal
>>>>>
>>>>> on { :s :p ?O}
>>>>>
>>>>> but not it's internal representation?
>>>>>
>>>>>
>>>>
>>>> This is a good question, I do not know the answer:-(, but is this an
>>>> entailment specific question? I would expect that to be a question for
>>>> SPARQL as a whole...
>>>>
>>>> Cheers
>>>>
>>>> Ivan
>>>
>>> There are 2 cases for value aware systems and there are examples of
>>> systems in each case:
>>>
>>> 1/ Data "1.00"^^xsd:decimal,
>>>       stores "1.00"^^xsd:decimal,
>>>       matches "1.0"^^xsd:decimal,
>>>       matches "1.00"^^xsd:decimal,
>>>       returns "1.00"^^xsd:decimal
>>>
>>> i.e. the original term is stored and returned
>>>
>>> 2/ Data "1.00"^^xsd:decimal,
>>>       stores "1.0"^^xsd:decimal,
>>>       matches "1.0"^^xsd:decimal
>>>       matches "1.00"^^xsd:decimal (canonicialization applied)
>>>       returns "1.0"^^xsd:decimal
>>>
>>> i.e. the canonicalized term is stored and returned
>>>
>>>
>>> See also "1"^^xsd:byte and "1"^^xsd:integer
>>>
>>> I avoided describing them as D-entailment because that really is a set
>>> of possibilities depending on the datatypes supported and ranges of
>>> values within the datatypes.  They don't necessarily force D-consistency.
>>>
>>> 	Andy
>>>
>>> Examples:
>>> 1 - Jena memory model
>>> 2 - Jena TDB
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the MessageLabs Email Security System.
>>> For more information please visit http://www.messagelabs.com/email
>>> ______________________________________________________________________
>>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
Received on Friday, 5 March 2010 17:05:30 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:41 GMT