W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > January to March 2010

Re: D-enatilment and canonicalization

From: Ivan Herman <ivan@w3.org>
Date: Fri, 05 Mar 2010 14:57:07 +0100
Message-ID: <4B910DB3.9060601@w3.org>
To: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
CC: Andy Seaborne <andy.seaborne@talis.com>, "Polleres, Axel" <axel.polleres@deri.org>, public-rdf-dawg@w3.org


On 2010-3-5 12:27 , Birte Glimm wrote:
> Good question indeed. My feeling is, that it is not an entailment
> regime, but rather another source of infinite answers from datatype
> aware systems.
> For OWL Direct Semantics, this is covered since there we only return
> asserted data values modulo sub-property entailment. This assumes that
> the original lexical form is returned. Internally we canonicalise
> everything (otherwise you cannot do reasoning with facets etc
> correctly), but we keep the original lexical form anyway to not
> confuse users by silently changing their data values even if it is to
> something equivalent.
> For D-Entailment/OWL RDF-Based Semantics, I am not quite sure what the
> best solution would be. At the moment, I restrict bindings to values
> that occur in the skolemised scoping graph. This guarantees
> finiteness. What is not clear to me is whether that restricts systems
> so that they have to return the original lexical form or whether the
> scoping graph is whatever systems build from the input when they parse
> it. My feeling is that systems can do what they prefer since in any
> case the result is graph equivalent to the active graph and even for
> the active graph I am not sure whether anything defines what the
> active graph actually contains after parsing a document with such
> datatype triples. E.g., if the input document had the triple
> ex:a ex:dp "1.00"^^xsd:decimal .
> then after loading, the active graph could contain
> ex:a ex:dp "1.0"^^xsd:decimal .
> I guess. Is that right?

My understanding of Andy's answer and, also, some experience I had with,
eg, RDFLib's internals is that datatypes may indeed be converted (eg,
when parsing) to some canonical version that is not verbatim identical
to the input string. So I think 'scoping graph is whatever systems build
from the input when they parse it' may be the only pragmatic approach
implementation-wise...

I.

> 
> The question is do we want to enforce something more specific?
> 
> Birte
> 
> 
> On 5 March 2010 10:07, Andy Seaborne <andy.seaborne@talis.com> wrote:
>> The SPARQL query really starts where the data is already loaded (FROM etc
>> not withstanding) so the data as it is loaded may be prepared in some
>> fashion outside the SPARQL spec.
>>
>> When we discussed this last time, we recognized that systems already did
>> work on loading RDF and did not introduce any text to obstruct them.
>>
>> As to whether it's an "entailment regime", if it is then it's finite and
>> different for each system.  It is done when data is loaded not queried
>> (think running rules over the data).
>>
>>
>> For example, TDB canonicalizes integers between -2^55 and +2^55-1 but not
>> outside that range (they have their original lexical form stored). Decimals
>> have 48 bits of precision and 8 bits of scale and again if outside the that
>> range, the normal node storage is used and the lexical form is not
>> canonicalised.
>>
>> Derived integer types are promoted to integer.
>>
>> (This in TDB is all "currently" and planned to change a little).
>>
>>        Andy
>>
>> On 05/03/2010 9:29 AM, Polleres, Axel wrote:
>>>
>>> Thanks andy, my (maybe naïve) question would then be: is behavior 2
>>> warranted "as is" by the current spec, or is "canonical datatype
>>> representation" actually another (commonly used already) "entailment regime"
>>> that should be defined as such?
>>>
>>> Best,
>>> Axel
>>>
>>> ----- Original Message -----
>>> From: Andy Seaborne<afs@talisplatform.com>
>>> To: Polleres, Axel
>>> Cc: ivan@w3.org<ivan@w3.org>;
>>> public-rdf-dawg@w3.org<public-rdf-dawg@w3.org>
>>> Sent: Fri Mar 05 09:06:09 2010
>>> Subject: D-enatilment and canonicalization
>>>
>>>
>>>
>>> On 05/03/2010 8:45 AM, Polleres, Axel wrote:
>>>>
>>>> In my opinion this is a question concerning all entailments from
>>>> D-entailment "upwards".
>>>>
>>>> ----- Original Message -----
>>>> From: Ivan Herman<ivan@w3.org>
>>>> To: Polleres, Axel
>>>> Cc: Birte Glimm<birte.glimm@comlab.ox.ac.uk>; SPARQL Working
>>>> Group<public-rdf-dawg@w3.org>
>>>> Sent: Fri Mar 05 08:08:10 2010
>>>> Subject: Re: [TF-ENT] Condition C2 modifications
>>>>
>>>>
>>>>
>>>> On 2010-3-5 24:36 , Axel Polleres wrote:
>>>>>
>>>>> No objections, but one additional side question:
>>>>>
>>>>> Do we have an issue with systems that use canonical forms of datatype
>>>>> literals internally?
>>>>>
>>>>> Say you have:
>>>>>
>>>>>   :s :p "1.000"^^xsd:decimal
>>>>>
>>>>> is a Datatype-aware system really supposed to return
>>>>>
>>>>>   "1.000"^^xsd:decimal
>>>>>
>>>>> on { :s :p ?O}
>>>>>
>>>>> but not it's internal representation?
>>>>>
>>>>>
>>>>
>>>> This is a good question, I do not know the answer:-(, but is this an
>>>> entailment specific question? I would expect that to be a question for
>>>> SPARQL as a whole...
>>>>
>>>> Cheers
>>>>
>>>> Ivan
>>>
>>> There are 2 cases for value aware systems and there are examples of
>>> systems in each case:
>>>
>>> 1/ Data "1.00"^^xsd:decimal,
>>>     stores "1.00"^^xsd:decimal,
>>>     matches "1.0"^^xsd:decimal,
>>>     matches "1.00"^^xsd:decimal,
>>>     returns "1.00"^^xsd:decimal
>>>
>>> i.e. the original term is stored and returned
>>>
>>> 2/ Data "1.00"^^xsd:decimal,
>>>     stores "1.0"^^xsd:decimal,
>>>     matches "1.0"^^xsd:decimal
>>>     matches "1.00"^^xsd:decimal (canonicialization applied)
>>>     returns "1.0"^^xsd:decimal
>>>
>>> i.e. the canonicalized term is stored and returned
>>>
>>>
>>> See also "1"^^xsd:byte and "1"^^xsd:integer
>>>
>>> I avoided describing them as D-entailment because that really is a set
>>> of possibilities depending on the datatypes supported and ranges of
>>> values within the datatypes.  They don't necessarily force D-consistency.
>>>
>>>        Andy
>>>
>>> Examples:
>>> 1 - Jena memory model
>>> 2 - Jena TDB
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the MessageLabs Email Security System.
>>> For more information please visit http://www.messagelabs.com/email
>>> ______________________________________________________________________
>>
>>
> 
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF   : http://www.ivan-herman.net/foaf.rdf
vCard  : http://www.ivan-herman.net/HermanIvan.vcf



Received on Friday, 5 March 2010 13:57:00 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:41 GMT