Re: Encouraging canonical serializations of datatypes in RDF from Nathan on 2012-08-01 (public-rdf-comments@w3.org from August 2012)

From: Nathan <nathan@webr3.org>
Date: Wed, 01 Aug 2012 15:15:03 +0100
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
CC: public-rdf-comments@w3.org, David Booth <david@dbooth.org>
Message-ID: <501939E7.9060901@webr3.org>
although one use-case (non-Z) can surely also be handled by including a 
location in the data, is there any usecase where timezone MUST be included?

Peter F. Patel-Schneider wrote:
> OK, both use cases are acknowledged.
> 
> Given that there is a use case where the time zone is important, how can 
> suggesting only using Z when timezone is not important be any help to 
> application writers?
> 
> peter
> 
> On 08/01/2012 10:00 AM, David Booth wrote:
>> Yes, preserving the timezone is important in use cases like that, in
>> which the xsd:datetime not only represents a point in time, but also
>> encodes timezone provenance in the literal form, i.e., it encodes both
>> when and *where* (in what timezone) the event occurred.
>>
>> But in other use cases (such as looking at the sequence of events in a
>> medical history) the xsd:datetime is used purely as a point in time.
>> Comparisons are vital, and timezone provenance is unimportant (and gets
>> in the way).
>>
>> So yes, there are clearly two different classes of use cases for
>> xsd:datetime.  But this does not mean that we should throw out the baby
>> with the bath.  Both use cases can be acknowledged.
>>
>> David
>>
>> On Wed, 2012-08-01 at 10:37 +0100, Steve Harris wrote:
>>> +1
>>>
>>> We have apps that operate in different timezones, so preserving the
>>> timezone in results is import.
>>>
>>> - Steve
>>>
>>> On 2012-08-01, at 10:07, Andy Seaborne wrote:
>>>
>>>> The majority of use for RDF for apps I'm involved in at the moment
>>> are all in the same time place.
>>>> Changing the data as it goes through the system, and hence breaking
>>> the display aspect of the data, is a complete non-starter.  I want to
>>> know what timezone the dateTime started as.
>>>> Display is more important than comparison.
>>>>
>>>> And, from time spent doing support, the first user expectation is
>>> that stuff that comes out looks like what went in.  Not changing the
>>> date part sometimes.
>>>> "2012-12-31T22:00:00-05:00"^^xsd:dateTime
>>>>
>>>> is the same time point as
>>>>
>>>> "2013-01-01T03:00:00Z"^^xsd:dateTime
>>>>
>>>> It's a different year.
>>>>
>>>>     Andy
>>>>
>>>> On 01/08/12 03:57, Peter F. Patel-Schneider wrote:
>>>>> On 07/31/2012 10:48 PM, David Booth wrote:
>>>>>> On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote:
>>>>>>> On 07/31/2012 03:59 PM, David Booth wrote:
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
>>>>>>>>> Hmm.
>>>>>>>>>
>>>>>>>>> Your two examples have different canonical forms in XML.   I do 
>>>>>>>>> not
>>>>>>>>> believe
>>>>>>>>> that going beyond XML canonicalization is a good idea.
>>>>>>>> What downside do you see?
>>>>>>> If RDF goes beyond XML canonicalization is it doing something to XML
>>>>>>> datatypes
>>>>>>> that is not part of the XML specification.   This appears to be
>>>>>>> driving a
>>>>>>> further wedge between RDF and XML data.
>>>>>> I guess I'm not following what you mean.  For example, the
>>>>>> xsd:datetimeStamp datatype already requires a timezoneFrag to be
>>>>>> specified, and one permissible timezoneFrag is "Z" (meaning UTC).  If
>>>>>> RDF canonicalization suggested that the timezoneFrag always be 
>>>>>> "Z", what
>>>>>> wedge would that drive between RDF and XML data?
>>>>> It would say that as far as RDF is concerned, XML data that doesn't 
>>>>> use
>>>>> Z is somehow second class.
>>>>>>> [...]
>>>>>>>
>>>>>>>>> In any case, I don't see the point here.  If equality-unique
>>>>>>>>> canonical forms
>>>>>>>>> are only encouraged, then applications will still have to do
>>>>>>>>> datatype-aware
>>>>>>>>> comparisons.
>>>>>>>> Only if they need to handle all possible data serializations.   
>>>>>>>> If 90%
>>>>>>>> of the available datasets use the canonical forms then many apps 
>>>>>>>> will
>>>>>>>> not need to do datatype-aware comparisons, though the ones that 
>>>>>>>> need to
>>>>>>>> cover 100% will.
>>>>>>> If even 99.99% of available datasets use the canonical forms then 
>>>>>>> all
>>>>>>> apps
>>>>>>> should still be prepared for non-canonical forms.  To do 
>>>>>>> otherwise is
>>>>>>> to be
>>>>>>> wrong.
>>>>>> It would be wrong for *some* apps, but by no means all.  You can't 
>>>>>> paint
>>>>>> all apps with the same brush.  For example, if there are 100 datasets
>>>>>> available, and 100 apps, and 90 of the datasets use the canonical 
>>>>>> forms,
>>>>>> and 40 of the apps only need the datasets that use the canonical 
>>>>>> forms,
>>>>>> then that substantially lowers the implementation barrier for 
>>>>>> those 40
>>>>>> apps.
>>>>> As long as these apps only use the 90, and stay away from the 10. This
>>>>> appears to break one of the prime motivations of RDF, that all data 
>>>>> can
>>>>> be used by anyone.
>>>>>>> That is not to say that being wrong is not useful on occasion, but I
>>>>>>> don't see that there is any good to be had here in the WG suggesting
>>>>>>> canonical
>>>>>>> forms be used exclusively.
>>>>>> I just described some substantial good.  I'm not suggesting that
>>>>>> canonicalization be used *exclusively*, but merely that it be
>>>>>> *encouraged*, because it does significantly simplify processing 
>>>>>> when it
>>>>>> can be used.
>>>>> I don't see the "significantly" here at all.
>>>>>>>> I think it is important to keep the RDF entry barrier as low as
>>>>>>>> possible
>>>>>>>> whenever possible, in order to support scruffy apps that are good
>>>>>>>> enough
>>>>>>>> for many purposes, even if they don't handle every case.
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>> It is important that apps should do the right thing.  For example,
>>>>>>> should apps
>>>>>>> ignore character encoding?  How hard is doing datatype-aware
>>>>>>> processing of
>>>>>>> literals, compared with all the rest of the stuff that is 
>>>>>>> required to
>>>>>>> handle RDF?
>>>>>> It depends entirely on the application.  In the case of xsd:datetime,
>>>>>> for example, it means the literal must be completely parsed into its
>>>>>> year, month, day, hour, minute, seconds and timezone offset, and then
>>>>>> datetime arithmetic -- which is *not* simple -- must be used to 
>>>>>> properly
>>>>>> add the timezone offset in order to compare two values.  All this
>>>>>> instead of a simple, string comparison!  But the worst part is 
>>>>>> that the
>>>>>> application has to *understand* the different datatypes, and this 
>>>>>> means
>>>>>> that the code either has to special case every datatype, or it has to
>>>>>> implement some kind of general datatype-handling framework.  
>>>>>> Suddenly,
>>>>>> an app that could have been a one-off, three-line perl script 
>>>>>> blows up
>>>>>> into something that requires significantly more development effort.
>>>>>>
>>>>>> The RDF model is so simple.  It would be nice if it could be 
>>>>>> processed
>>>>>> very simply whenever possible.  "Make the simple cases simple", etc.
>>>>> The simplicity of the RDF model is, in my mind, tied up with its
>>>>> uniformity. Your proposal severely breaks that uniformity, which is a
>>>>> major lossage.
>>>>>>> peter
>>>>>>>
>>>>>>> PS:  Yes, I do use text processors to handle RDF, and quite 
>>>>>>> often, even
>>>>>>> analysing the 2011 Billion Triple Challenge triples using sed and 
>>>>>>> grep.
>>>>>>> However, I check to ensure that the right thing happens.
>>>>>> Right, that's exactly the kind of simplified processing that I 
>>>>>> think we
>>>>>> should facilitate as often as possible.
>>>>>>
>>>>>>
>>>>> Sure, as long as it is only in one-off hacks, controlled by 
>>>>> experts, who
>>>>> can adjust the processing according to the peculiarities of the input.
>>>>> As soon as direct expert control goes away, then the app needs to be
>>>>> able to consume all RDF, which I see as counter to your proposal.
>>>>>
>>>>> peter
>>>>>
>>>>>
>>>>>
>>>>>
> 
> 
> 
>
Received on Wednesday, 1 August 2012 14:16:23 UTC