Re: Encouraging canonical serializations of datatypes in RDF from Peter F. Patel-Schneider on 2012-08-01 (public-rdf-comments@w3.org from August 2012)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 01 Aug 2012 10:10:25 -0400
To: public-rdf-comments@w3.org
CC: David Booth <david@dbooth.org>
Message-ID: <501938D1.3020800@gmail.com>
OK, both use cases are acknowledged.

Given that there is a use case where the time zone is important, how can 
suggesting only using Z when timezone is not important be any help to 
application writers?

peter

On 08/01/2012 10:00 AM, David Booth wrote:
> Yes, preserving the timezone is important in use cases like that, in
> which the xsd:datetime not only represents a point in time, but also
> encodes timezone provenance in the literal form, i.e., it encodes both
> when and *where* (in what timezone) the event occurred.
>
> But in other use cases (such as looking at the sequence of events in a
> medical history) the xsd:datetime is used purely as a point in time.
> Comparisons are vital, and timezone provenance is unimportant (and gets
> in the way).
>
> So yes, there are clearly two different classes of use cases for
> xsd:datetime.  But this does not mean that we should throw out the baby
> with the bath.  Both use cases can be acknowledged.
>
> David
>
> On Wed, 2012-08-01 at 10:37 +0100, Steve Harris wrote:
>> +1
>>
>> We have apps that operate in different timezones, so preserving the
>> timezone in results is import.
>>
>> - Steve
>>
>> On 2012-08-01, at 10:07, Andy Seaborne wrote:
>>
>>> The majority of use for RDF for apps I'm involved in at the moment
>> are all in the same time place.
>>> Changing the data as it goes through the system, and hence breaking
>> the display aspect of the data, is a complete non-starter.  I want to
>> know what timezone the dateTime started as.
>>> Display is more important than comparison.
>>>
>>> And, from time spent doing support, the first user expectation is
>> that stuff that comes out looks like what went in.  Not changing the
>> date part sometimes.
>>> "2012-12-31T22:00:00-05:00"^^xsd:dateTime
>>>
>>> is the same time point as
>>>
>>> "2013-01-01T03:00:00Z"^^xsd:dateTime
>>>
>>> It's a different year.
>>>
>>>  Andy
>>>
>>> On 01/08/12 03:57, Peter F. Patel-Schneider wrote:
>>>> On 07/31/2012 10:48 PM, David Booth wrote:
>>>>> On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote:
>>>>>> On 07/31/2012 03:59 PM, David Booth wrote:
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
>>>>>>>> Hmm.
>>>>>>>>
>>>>>>>> Your two examples have different canonical forms in XML.   I do not
>>>>>>>> believe
>>>>>>>> that going beyond XML canonicalization is a good idea.
>>>>>>> What downside do you see?
>>>>>> If RDF goes beyond XML canonicalization is it doing something to XML
>>>>>> datatypes
>>>>>> that is not part of the XML specification.   This appears to be
>>>>>> driving a
>>>>>> further wedge between RDF and XML data.
>>>>> I guess I'm not following what you mean.  For example, the
>>>>> xsd:datetimeStamp datatype already requires a timezoneFrag to be
>>>>> specified, and one permissible timezoneFrag is "Z" (meaning UTC).  If
>>>>> RDF canonicalization suggested that the timezoneFrag always be "Z", what
>>>>> wedge would that drive between RDF and XML data?
>>>> It would say that as far as RDF is concerned, XML data that doesn't use
>>>> Z is somehow second class.
>>>>>> [...]
>>>>>>
>>>>>>>> In any case, I don't see the point here.  If equality-unique
>>>>>>>> canonical forms
>>>>>>>> are only encouraged, then applications will still have to do
>>>>>>>> datatype-aware
>>>>>>>> comparisons.
>>>>>>> Only if they need to handle all possible data serializations.   If 90%
>>>>>>> of the available datasets use the canonical forms then many apps will
>>>>>>> not need to do datatype-aware comparisons, though the ones that need to
>>>>>>> cover 100% will.
>>>>>> If even 99.99% of available datasets use the canonical forms then all
>>>>>> apps
>>>>>> should still be prepared for non-canonical forms.  To do otherwise is
>>>>>> to be
>>>>>> wrong.
>>>>> It would be wrong for *some* apps, but by no means all.  You can't paint
>>>>> all apps with the same brush.  For example, if there are 100 datasets
>>>>> available, and 100 apps, and 90 of the datasets use the canonical forms,
>>>>> and 40 of the apps only need the datasets that use the canonical forms,
>>>>> then that substantially lowers the implementation barrier for those 40
>>>>> apps.
>>>> As long as these apps only use the 90, and stay away from the 10. This
>>>> appears to break one of the prime motivations of RDF, that all data can
>>>> be used by anyone.
>>>>>> That is not to say that being wrong is not useful on occasion, but I
>>>>>> don't see that there is any good to be had here in the WG suggesting
>>>>>> canonical
>>>>>> forms be used exclusively.
>>>>> I just described some substantial good.  I'm not suggesting that
>>>>> canonicalization be used *exclusively*, but merely that it be
>>>>> *encouraged*, because it does significantly simplify processing when it
>>>>> can be used.
>>>> I don't see the "significantly" here at all.
>>>>>>> I think it is important to keep the RDF entry barrier as low as
>>>>>>> possible
>>>>>>> whenever possible, in order to support scruffy apps that are good
>>>>>>> enough
>>>>>>> for many purposes, even if they don't handle every case.
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>> It is important that apps should do the right thing.  For example,
>>>>>> should apps
>>>>>> ignore character encoding?  How hard is doing datatype-aware
>>>>>> processing of
>>>>>> literals, compared with all the rest of the stuff that is required to
>>>>>> handle RDF?
>>>>> It depends entirely on the application.  In the case of xsd:datetime,
>>>>> for example, it means the literal must be completely parsed into its
>>>>> year, month, day, hour, minute, seconds and timezone offset, and then
>>>>> datetime arithmetic -- which is *not* simple -- must be used to properly
>>>>> add the timezone offset in order to compare two values.  All this
>>>>> instead of a simple, string comparison!  But the worst part is that the
>>>>> application has to *understand* the different datatypes, and this means
>>>>> that the code either has to special case every datatype, or it has to
>>>>> implement some kind of general datatype-handling framework.  Suddenly,
>>>>> an app that could have been a one-off, three-line perl script blows up
>>>>> into something that requires significantly more development effort.
>>>>>
>>>>> The RDF model is so simple.  It would be nice if it could be processed
>>>>> very simply whenever possible.  "Make the simple cases simple", etc.
>>>> The simplicity of the RDF model is, in my mind, tied up with its
>>>> uniformity. Your proposal severely breaks that uniformity, which is a
>>>> major lossage.
>>>>>> peter
>>>>>>
>>>>>> PS:  Yes, I do use text processors to handle RDF, and quite often, even
>>>>>> analysing the 2011 Billion Triple Challenge triples using sed and grep.
>>>>>> However, I check to ensure that the right thing happens.
>>>>> Right, that's exactly the kind of simplified processing that I think we
>>>>> should facilitate as often as possible.
>>>>>
>>>>>
>>>> Sure, as long as it is only in one-off hacks, controlled by experts, who
>>>> can adjust the processing according to the peculiarities of the input.
>>>> As soon as direct expert control goes away, then the app needs to be
>>>> able to consume all RDF, which I see as counter to your proposal.
>>>>
>>>> peter
>>>>
>>>>
>>>>
>>>>
Received on Wednesday, 1 August 2012 14:11:00 UTC