Re: Encouraging canonical serializations of datatypes in RDF from Steve Harris on 2012-08-01 (public-rdf-comments@w3.org from August 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 1 Aug 2012 10:37:09 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-comments@w3.org
Message-Id: <0F1117CE-4F78-4D3D-BF19-B489A067C6B3@garlik.com>
+1

We have apps that operate in different timezones, so preserving the timezone in results is import.

- Steve

On 2012-08-01, at 10:07, Andy Seaborne wrote:

> The majority of use for RDF for apps I'm involved in at the moment are all in the same time place.
> 
> Changing the data as it goes through the system, and hence breaking the display aspect of the data, is a complete non-starter.  I want to know what timezone the dateTime started as.
> 
> Display is more important than comparison.
> 
> And, from time spent doing support, the first user expectation is that stuff that comes out looks like what went in.  Not changing the date part sometimes.
> 
> "2012-12-31T22:00:00-05:00"^^xsd:dateTime
> 
> is the same time point as
> 
> "2013-01-01T03:00:00Z"^^xsd:dateTime
> 
> It's a different year.
> 
> 	Andy
> 
> On 01/08/12 03:57, Peter F. Patel-Schneider wrote:
>> 
>> On 07/31/2012 10:48 PM, David Booth wrote:
>>> On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote:
>>>> On 07/31/2012 03:59 PM, David Booth wrote:
>>>>> Hi Peter,
>>>>> 
>>>>> On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
>>>>>> Hmm.
>>>>>> 
>>>>>> Your two examples have different canonical forms in XML.   I do not
>>>>>> believe
>>>>>> that going beyond XML canonicalization is a good idea.
>>>>> What downside do you see?
>>>> If RDF goes beyond XML canonicalization is it doing something to XML
>>>> datatypes
>>>> that is not part of the XML specification.   This appears to be
>>>> driving a
>>>> further wedge between RDF and XML data.
>>> I guess I'm not following what you mean.  For example, the
>>> xsd:datetimeStamp datatype already requires a timezoneFrag to be
>>> specified, and one permissible timezoneFrag is "Z" (meaning UTC).  If
>>> RDF canonicalization suggested that the timezoneFrag always be "Z", what
>>> wedge would that drive between RDF and XML data?
>> 
>> It would say that as far as RDF is concerned, XML data that doesn't use
>> Z is somehow second class.
>>> 
>>>> [...]
>>>> 
>>>>>> In any case, I don't see the point here.  If equality-unique
>>>>>> canonical forms
>>>>>> are only encouraged, then applications will still have to do
>>>>>> datatype-aware
>>>>>> comparisons.
>>>>> Only if they need to handle all possible data serializations.   If 90%
>>>>> of the available datasets use the canonical forms then many apps will
>>>>> not need to do datatype-aware comparisons, though the ones that need to
>>>>> cover 100% will.
>>>> If even 99.99% of available datasets use the canonical forms then all
>>>> apps
>>>> should still be prepared for non-canonical forms.  To do otherwise is
>>>> to be
>>>> wrong.
>>> It would be wrong for *some* apps, but by no means all.  You can't paint
>>> all apps with the same brush.  For example, if there are 100 datasets
>>> available, and 100 apps, and 90 of the datasets use the canonical forms,
>>> and 40 of the apps only need the datasets that use the canonical forms,
>>> then that substantially lowers the implementation barrier for those 40
>>> apps.
>> As long as these apps only use the 90, and stay away from the 10. This
>> appears to break one of the prime motivations of RDF, that all data can
>> be used by anyone.
>>> 
>>>> That is not to say that being wrong is not useful on occasion, but I
>>>> don't see that there is any good to be had here in the WG suggesting
>>>> canonical
>>>> forms be used exclusively.
>>> I just described some substantial good.  I'm not suggesting that
>>> canonicalization be used *exclusively*, but merely that it be
>>> *encouraged*, because it does significantly simplify processing when it
>>> can be used.
>> 
>> I don't see the "significantly" here at all.
>>> 
>>>>> I think it is important to keep the RDF entry barrier as low as
>>>>> possible
>>>>> whenever possible, in order to support scruffy apps that are good
>>>>> enough
>>>>> for many purposes, even if they don't handle every case.
>>>>> 
>>>>> David
>>>>> 
>>>> It is important that apps should do the right thing.  For example,
>>>> should apps
>>>> ignore character encoding?  How hard is doing datatype-aware
>>>> processing of
>>>> literals, compared with all the rest of the stuff that is required to
>>>> handle RDF?
>>> It depends entirely on the application.  In the case of xsd:datetime,
>>> for example, it means the literal must be completely parsed into its
>>> year, month, day, hour, minute, seconds and timezone offset, and then
>>> datetime arithmetic -- which is *not* simple -- must be used to properly
>>> add the timezone offset in order to compare two values.  All this
>>> instead of a simple, string comparison!  But the worst part is that the
>>> application has to *understand* the different datatypes, and this means
>>> that the code either has to special case every datatype, or it has to
>>> implement some kind of general datatype-handling framework.  Suddenly,
>>> an app that could have been a one-off, three-line perl script blows up
>>> into something that requires significantly more development effort.
>>> 
>>> The RDF model is so simple.  It would be nice if it could be processed
>>> very simply whenever possible.  "Make the simple cases simple", etc.
>> 
>> The simplicity of the RDF model is, in my mind, tied up with its
>> uniformity. Your proposal severely breaks that uniformity, which is a
>> major lossage.
>>> 
>>>> peter
>>>> 
>>>> PS:  Yes, I do use text processors to handle RDF, and quite often, even
>>>> analysing the 2011 Billion Triple Challenge triples using sed and grep.
>>>> However, I check to ensure that the right thing happens.
>>> Right, that's exactly the kind of simplified processing that I think we
>>> should facilitate as often as possible.
>>> 
>>> 
>> Sure, as long as it is only in one-off hacks, controlled by experts, who
>> can adjust the processing according to the peculiarities of the input.
>> As soon as direct expert control goes away, then the app needs to be
>> able to consume all RDF, which I see as counter to your proposal.
>> 
>> peter
>> 
>> 
>> 
>> 
> 

-- 
Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ
Received on Wednesday, 1 August 2012 09:37:45 UTC