Re: ACTION-419: Sync with Birte on Datatypes for canonicalisation from Birte Glimm on 2011-03-30 (public-rdf-dawg@w3.org from January to March 2011)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 30 Mar 2011 18:21:50 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: Bijan Parsia <bparsia@cs.man.ac.uk>, Matthew Perry <matthew.perry@oracle.com>, W3C SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTik+Hhb9MzRe2nY4C7jU3bP2qocr3r6z4GBJm4CM@mail.gmail.com>

Thanks Bijan and Andy, it's now clearer what says what. I, personally,
prefer by far the 1.1 (datatypes) spec. The 1.0 spec is even
contradictory in saying that canonical representations of decimals
must have a decimal point and that derived types (such as integers)
inherit the canonical representation from their primitive type, but
integer specifically forbids decimal points. What is rather bizare is
that the 1.1 Candidate Rec went back to LC, but with a document dated
from 2009 that does not explain anything and that even has a note that
last call coments are due by 31 Dec 2009.
I just discussed this also with Boris (Motik) and it seems that there
is even a problem with the 1.1 spec since it is hard/impossible to
define a canonical representation in the primitive type that is
guaranteed valid for all derived types, e.g., for integers there is
now an explicit exception in the definition of canonical
representation that says "Specifically, for integers, the decimal
point and fractional part are prohibited." I could, however, define
other derived types, e.g., "integers with preceeding 0" by applying a
pattern to integers that only allow values of the form "01", "0...".
They are all lexically ok integers and decimals, but the canonical
representation would no be in the lexical space of that datatype. Now
one could argue that nobody is likely to define anything like that,
but it might be safer todefine canonical representation per type,
i.e., each primitive or derived type has o explicitly say what the
canonical form is and even if it is just "same as for the super type".
Anyway, I would like to go with the 1.1 definition and hope that there
are no objections to that.

As Andy pointed out, it migh be worth thinking about this issue also
with respect to SPARQL Update. If I put a canonical representation in
and specify a canonical representation in my delete, then all is fine,
but in all other cases things can get messy. If the store
canonicalises both data and query things are ok-ish, but users still
might not like that if they put
:s :p "1"^^xsd:short
in and delete
:s :p "1.0"^^xsd:decimal
then the triple is gone (assuming that the type is always the
primitive one after canonicalisaion, which is not really defined yet
anywhere). Even worse IMO would be if the store canonicalises the
data, but not the query. So you put
:s :p "01"^^xsd:short
in and then try to delete
:s :p "01"^^xsd:short
and nothing is deleted, because the data value in the store has been
canonicalised whereas the one in the query hasn't.

Also, at the moment, the D-entailment regime does not prescribe
anything to be done when loading and you could even have
un-canonicalised data values in your graph as long as you somehow
manage to give the correct answers, which are only those with
canonical representations, so you could theoretically have
:s :p "2.200"^^xsd:float .
:s :p "2.20"^^xsd:float .
and BGP
:s :p ?dv .
as long as you give just one answer with ?dv bound to the canonical
represenation, which I think is "2.2"^^xsd:float. Obviously, it is
quite unlikely that anybody would do that (I think), but assume
someone does. If now a DELETE query is issued to delete
:s :p "2.2"^^xsd:float .
since you believe that this triple is in the graph, then you are
mistaken and nothing happens.

Birte

On 29 March 2011 19:58, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:
>
>
> On 29/03/11 18:52, Bijan Parsia wrote:
>>
>> On 29 Mar 2011, at 18:46, Bijan Parsia wrote:
>>
>>> On 29 Mar 2011, at 18:19, Andy Seaborne wrote:
>>>
>>>> On 29/03/11 18:04, Bijan Parsia wrote:
>>>>>
>>>>> On 29 Mar 2011, at 17:54, Birte Glimm wrote:
>>
>> [snip]
>>>>>
>>>>> Are you using Schema 1.1 (recommended even though not a
>>>>> recommendation!).
>>>>
>>>> Curious - why?
>>>
>>> My and Boris's experience in the OWL WG is that the 1.1 specs are *much*
>>> superior in organization, clarity, and detail and well as nailing down
>>> various aspects.
>>
>> [snip]
>>
>> We also thought it was going to be done "soon" :)
>>
>> Note that 1.1. seems to be what Oracle is doing (no surprise, their being
>> still active in the XS group).
>
> Always possible as SPARQL Query says nothing about processing the data
> getting into the dataset so the fact the original data says xsd:integer and
> the results say xsd:decimal is explainable.  Helpful if the SPARQL query
> also has canonicalization applied on parsing.  SPARQL Update is interesting.
>
>        Andy
>
>>
>> Cheers,
>> Bijan.
>

-- 
Dr. Birte Glimm, Room 309
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283520

Received on Wednesday, 30 March 2011 17:22:23 UTC