Re: D-enatilment and canonicalization from Birte Glimm on 2010-03-05 (public-rdf-dawg@w3.org from January to March 2010)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Fri, 5 Mar 2010 11:27:55 +0000
To: Andy Seaborne <andy.seaborne@talis.com>
Cc: "Polleres, Axel" <axel.polleres@deri.org>, ivan@w3.org, public-rdf-dawg@w3.org
Message-ID: <492f2b0b1003050327n5705d042s77c83f7b260510ae@mail.gmail.com>
Good question indeed. My feeling is, that it is not an entailment
regime, but rather another source of infinite answers from datatype
aware systems.
For OWL Direct Semantics, this is covered since there we only return
asserted data values modulo sub-property entailment. This assumes that
the original lexical form is returned. Internally we canonicalise
everything (otherwise you cannot do reasoning with facets etc
correctly), but we keep the original lexical form anyway to not
confuse users by silently changing their data values even if it is to
something equivalent.
For D-Entailment/OWL RDF-Based Semantics, I am not quite sure what the
best solution would be. At the moment, I restrict bindings to values
that occur in the skolemised scoping graph. This guarantees
finiteness. What is not clear to me is whether that restricts systems
so that they have to return the original lexical form or whether the
scoping graph is whatever systems build from the input when they parse
it. My feeling is that systems can do what they prefer since in any
case the result is graph equivalent to the active graph and even for
the active graph I am not sure whether anything defines what the
active graph actually contains after parsing a document with such
datatype triples. E.g., if the input document had the triple
ex:a ex:dp "1.00"^^xsd:decimal .
then after loading, the active graph could contain
ex:a ex:dp "1.0"^^xsd:decimal .
I guess. Is that right?

The question is do we want to enforce something more specific?

Birte


On 5 March 2010 10:07, Andy Seaborne <andy.seaborne@talis.com> wrote:
> The SPARQL query really starts where the data is already loaded (FROM etc
> not withstanding) so the data as it is loaded may be prepared in some
> fashion outside the SPARQL spec.
>
> When we discussed this last time, we recognized that systems already did
> work on loading RDF and did not introduce any text to obstruct them.
>
> As to whether it's an "entailment regime", if it is then it's finite and
> different for each system.  It is done when data is loaded not queried
> (think running rules over the data).
>
>
> For example, TDB canonicalizes integers between -2^55 and +2^55-1 but not
> outside that range (they have their original lexical form stored). Decimals
> have 48 bits of precision and 8 bits of scale and again if outside the that
> range, the normal node storage is used and the lexical form is not
> canonicalised.
>
> Derived integer types are promoted to integer.
>
> (This in TDB is all "currently" and planned to change a little).
>
>        Andy
>
> On 05/03/2010 9:29 AM, Polleres, Axel wrote:
>>
>> Thanks andy, my (maybe naïve) question would then be: is behavior 2
>> warranted "as is" by the current spec, or is "canonical datatype
>> representation" actually another (commonly used already) "entailment regime"
>> that should be defined as such?
>>
>> Best,
>> Axel
>>
>> ----- Original Message -----
>> From: Andy Seaborne<afs@talisplatform.com>
>> To: Polleres, Axel
>> Cc: ivan@w3.org<ivan@w3.org>;
>> public-rdf-dawg@w3.org<public-rdf-dawg@w3.org>
>> Sent: Fri Mar 05 09:06:09 2010
>> Subject: D-enatilment and canonicalization
>>
>>
>>
>> On 05/03/2010 8:45 AM, Polleres, Axel wrote:
>>>
>>> In my opinion this is a question concerning all entailments from
>>> D-entailment "upwards".
>>>
>>> ----- Original Message -----
>>> From: Ivan Herman<ivan@w3.org>
>>> To: Polleres, Axel
>>> Cc: Birte Glimm<birte.glimm@comlab.ox.ac.uk>; SPARQL Working
>>> Group<public-rdf-dawg@w3.org>
>>> Sent: Fri Mar 05 08:08:10 2010
>>> Subject: Re: [TF-ENT] Condition C2 modifications
>>>
>>>
>>>
>>> On 2010-3-5 24:36 , Axel Polleres wrote:
>>>>
>>>> No objections, but one additional side question:
>>>>
>>>> Do we have an issue with systems that use canonical forms of datatype
>>>> literals internally?
>>>>
>>>> Say you have:
>>>>
>>>>   :s :p "1.000"^^xsd:decimal
>>>>
>>>> is a Datatype-aware system really supposed to return
>>>>
>>>>   "1.000"^^xsd:decimal
>>>>
>>>> on { :s :p ?O}
>>>>
>>>> but not it's internal representation?
>>>>
>>>>
>>>
>>> This is a good question, I do not know the answer:-(, but is this an
>>> entailment specific question? I would expect that to be a question for
>>> SPARQL as a whole...
>>>
>>> Cheers
>>>
>>> Ivan
>>
>> There are 2 cases for value aware systems and there are examples of
>> systems in each case:
>>
>> 1/ Data "1.00"^^xsd:decimal,
>>     stores "1.00"^^xsd:decimal,
>>     matches "1.0"^^xsd:decimal,
>>     matches "1.00"^^xsd:decimal,
>>     returns "1.00"^^xsd:decimal
>>
>> i.e. the original term is stored and returned
>>
>> 2/ Data "1.00"^^xsd:decimal,
>>     stores "1.0"^^xsd:decimal,
>>     matches "1.0"^^xsd:decimal
>>     matches "1.00"^^xsd:decimal (canonicialization applied)
>>     returns "1.0"^^xsd:decimal
>>
>> i.e. the canonicalized term is stored and returned
>>
>>
>> See also "1"^^xsd:byte and "1"^^xsd:integer
>>
>> I avoided describing them as D-entailment because that really is a set
>> of possibilities depending on the datatypes supported and ranges of
>> values within the datatypes.  They don't necessarily force D-consistency.
>>
>>        Andy
>>
>> Examples:
>> 1 - Jena memory model
>> 2 - Jena TDB
>>
>> ______________________________________________________________________
>> This email has been scanned by the MessageLabs Email Security System.
>> For more information please visit http://www.messagelabs.com/email
>> ______________________________________________________________________
>
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Friday, 5 March 2010 11:28:28 UTC