Re: Encouraging canonical serializations of datatypes in RDF from Andy Seaborne on 2012-08-01 (public-rdf-comments@w3.org from August 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 01 Aug 2012 15:37:54 +0100
To: public-rdf-comments@w3.org
Message-ID: <50193F42.3090606@epimorphics.com>
On 01/08/12 15:15, Nathan wrote:
> although one use-case (non-Z) can surely also be handled by including a
> location in the data, is there any usecase where timezone MUST be included?

... on each literal since the data may be gathered from different places.

Isn't the timezone that piece of location information?

It is the currently deployed solution.  Only by requiring a global data 
change to move it out (otherwise exists Z data is ambiguous) can you 
include location info.

Given XSD libraries exist for most/all programming languages, why not 
use one?  If the app is that time-sensitive it must at least 
parse-and-check data.

Calculating the value is a projection from the date/time Seven Property 
Model on to a one dimensional value space [+].  It's a lossy projection. 
  So don't do it until you know the purpose.

Or do it when loading the data (ETL) when the purpose of the data can be 
decided - and the assumption not to republish made.

(XSD 1.1 can't even represent all points in time in 2012 anyway)

 Andy

[+] actually it's two dimensional for xsd:dateTime :-)

>
> Peter F. Patel-Schneider wrote:
>> OK, both use cases are acknowledged.
>>
>> Given that there is a use case where the time zone is important, how
>> can suggesting only using Z when timezone is not important be any help
>> to application writers?
>>
>> peter
>>
>> On 08/01/2012 10:00 AM, David Booth wrote:
>>> Yes, preserving the timezone is important in use cases like that, in
>>> which the xsd:datetime not only represents a point in time, but also
>>> encodes timezone provenance in the literal form, i.e., it encodes both
>>> when and *where* (in what timezone) the event occurred.
>>>
>>> But in other use cases (such as looking at the sequence of events in a
>>> medical history) the xsd:datetime is used purely as a point in time.
>>> Comparisons are vital, and timezone provenance is unimportant (and gets
>>> in the way).
>>>
>>> So yes, there are clearly two different classes of use cases for
>>> xsd:datetime.  But this does not mean that we should throw out the baby
>>> with the bath.  Both use cases can be acknowledged.
>>>
>>> David
>>>
>>> On Wed, 2012-08-01 at 10:37 +0100, Steve Harris wrote:
>>>> +1
>>>>
>>>> We have apps that operate in different timezones, so preserving the
>>>> timezone in results is import.
>>>>
>>>> - Steve
>>>>
>>>> On 2012-08-01, at 10:07, Andy Seaborne wrote:
>>>>
>>>>> The majority of use for RDF for apps I'm involved in at the moment
>>>> are all in the same time place.
>>>>> Changing the data as it goes through the system, and hence breaking
>>>> the display aspect of the data, is a complete non-starter.  I want to
>>>> know what timezone the dateTime started as.
>>>>> Display is more important than comparison.
>>>>>
>>>>> And, from time spent doing support, the first user expectation is
>>>> that stuff that comes out looks like what went in.  Not changing the
>>>> date part sometimes.
>>>>> "2012-12-31T22:00:00-05:00"^^xsd:dateTime
>>>>>
>>>>> is the same time point as
>>>>>
>>>>> "2013-01-01T03:00:00Z"^^xsd:dateTime
>>>>>
>>>>> It's a different year.
>>>>>
>>>>>     Andy
>>>>>
>>>>> On 01/08/12 03:57, Peter F. Patel-Schneider wrote:
>>>>>> On 07/31/2012 10:48 PM, David Booth wrote:
>>>>>>> On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote:
>>>>>>>> On 07/31/2012 03:59 PM, David Booth wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
>>>>>>>>>> Hmm.
>>>>>>>>>>
>>>>>>>>>> Your two examples have different canonical forms in XML.   I
>>>>>>>>>> do not
>>>>>>>>>> believe
>>>>>>>>>> that going beyond XML canonicalization is a good idea.
>>>>>>>>> What downside do you see?
>>>>>>>> If RDF goes beyond XML canonicalization is it doing something to
>>>>>>>> XML
>>>>>>>> datatypes
>>>>>>>> that is not part of the XML specification.   This appears to be
>>>>>>>> driving a
>>>>>>>> further wedge between RDF and XML data.
>>>>>>> I guess I'm not following what you mean.  For example, the
>>>>>>> xsd:datetimeStamp datatype already requires a timezoneFrag to be
>>>>>>> specified, and one permissible timezoneFrag is "Z" (meaning
>>>>>>> UTC).  If
>>>>>>> RDF canonicalization suggested that the timezoneFrag always be
>>>>>>> "Z", what
>>>>>>> wedge would that drive between RDF and XML data?
>>>>>> It would say that as far as RDF is concerned, XML data that
>>>>>> doesn't use
>>>>>> Z is somehow second class.
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>>> In any case, I don't see the point here.  If equality-unique
>>>>>>>>>> canonical forms
>>>>>>>>>> are only encouraged, then applications will still have to do
>>>>>>>>>> datatype-aware
>>>>>>>>>> comparisons.
>>>>>>>>> Only if they need to handle all possible data serializations.
>>>>>>>>> If 90%
>>>>>>>>> of the available datasets use the canonical forms then many
>>>>>>>>> apps will
>>>>>>>>> not need to do datatype-aware comparisons, though the ones that
>>>>>>>>> need to
>>>>>>>>> cover 100% will.
>>>>>>>> If even 99.99% of available datasets use the canonical forms
>>>>>>>> then all
>>>>>>>> apps
>>>>>>>> should still be prepared for non-canonical forms.  To do
>>>>>>>> otherwise is
>>>>>>>> to be
>>>>>>>> wrong.
>>>>>>> It would be wrong for *some* apps, but by no means all.  You
>>>>>>> can't paint
>>>>>>> all apps with the same brush.  For example, if there are 100
>>>>>>> datasets
>>>>>>> available, and 100 apps, and 90 of the datasets use the canonical
>>>>>>> forms,
>>>>>>> and 40 of the apps only need the datasets that use the canonical
>>>>>>> forms,
>>>>>>> then that substantially lowers the implementation barrier for
>>>>>>> those 40
>>>>>>> apps.
>>>>>> As long as these apps only use the 90, and stay away from the 10.
>>>>>> This
>>>>>> appears to break one of the prime motivations of RDF, that all
>>>>>> data can
>>>>>> be used by anyone.
>>>>>>>> That is not to say that being wrong is not useful on occasion,
>>>>>>>> but I
>>>>>>>> don't see that there is any good to be had here in the WG
>>>>>>>> suggesting
>>>>>>>> canonical
>>>>>>>> forms be used exclusively.
>>>>>>> I just described some substantial good.  I'm not suggesting that
>>>>>>> canonicalization be used *exclusively*, but merely that it be
>>>>>>> *encouraged*, because it does significantly simplify processing
>>>>>>> when it
>>>>>>> can be used.
>>>>>> I don't see the "significantly" here at all.
>>>>>>>>> I think it is important to keep the RDF entry barrier as low as
>>>>>>>>> possible
>>>>>>>>> whenever possible, in order to support scruffy apps that are good
>>>>>>>>> enough
>>>>>>>>> for many purposes, even if they don't handle every case.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>> It is important that apps should do the right thing.  For example,
>>>>>>>> should apps
>>>>>>>> ignore character encoding?  How hard is doing datatype-aware
>>>>>>>> processing of
>>>>>>>> literals, compared with all the rest of the stuff that is
>>>>>>>> required to
>>>>>>>> handle RDF?
>>>>>>> It depends entirely on the application.  In the case of
>>>>>>> xsd:datetime,
>>>>>>> for example, it means the literal must be completely parsed into its
>>>>>>> year, month, day, hour, minute, seconds and timezone offset, and
>>>>>>> then
>>>>>>> datetime arithmetic -- which is *not* simple -- must be used to
>>>>>>> properly
>>>>>>> add the timezone offset in order to compare two values.  All this
>>>>>>> instead of a simple, string comparison!  But the worst part is
>>>>>>> that the
>>>>>>> application has to *understand* the different datatypes, and this
>>>>>>> means
>>>>>>> that the code either has to special case every datatype, or it
>>>>>>> has to
>>>>>>> implement some kind of general datatype-handling framework.
>>>>>>> Suddenly,
>>>>>>> an app that could have been a one-off, three-line perl script
>>>>>>> blows up
>>>>>>> into something that requires significantly more development effort.
>>>>>>>
>>>>>>> The RDF model is so simple.  It would be nice if it could be
>>>>>>> processed
>>>>>>> very simply whenever possible.  "Make the simple cases simple", etc.
>>>>>> The simplicity of the RDF model is, in my mind, tied up with its
>>>>>> uniformity. Your proposal severely breaks that uniformity, which is a
>>>>>> major lossage.
>>>>>>>> peter
>>>>>>>>
>>>>>>>> PS:  Yes, I do use text processors to handle RDF, and quite
>>>>>>>> often, even
>>>>>>>> analysing the 2011 Billion Triple Challenge triples using sed
>>>>>>>> and grep.
>>>>>>>> However, I check to ensure that the right thing happens.
>>>>>>> Right, that's exactly the kind of simplified processing that I
>>>>>>> think we
>>>>>>> should facilitate as often as possible.
>>>>>>>
>>>>>>>
>>>>>> Sure, as long as it is only in one-off hacks, controlled by
>>>>>> experts, who
>>>>>> can adjust the processing according to the peculiarities of the
>>>>>> input.
>>>>>> As soon as direct expert control goes away, then the app needs to be
>>>>>> able to consume all RDF, which I see as counter to your proposal.
>>>>>>
>>>>>> peter
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>
>>
>>
>>
>
>
Received on Wednesday, 1 August 2012 14:38:24 UTC