Re: Encouraging canonical serializations of datatypes in RDF from David Booth on 2012-08-01 (public-rdf-comments@w3.org from August 2012)

From: David Booth <david@dbooth.org>
Date: Wed, 01 Aug 2012 10:00:35 -0400
To: Steve Harris <steve.harris@garlik.com>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-comments@w3.org
Message-ID: <1343829635.2725.82151.camel@dbooth-laptop>
Yes, preserving the timezone is important in use cases like that, in
which the xsd:datetime not only represents a point in time, but also
encodes timezone provenance in the literal form, i.e., it encodes both
when and *where* (in what timezone) the event occurred.  

But in other use cases (such as looking at the sequence of events in a
medical history) the xsd:datetime is used purely as a point in time.
Comparisons are vital, and timezone provenance is unimportant (and gets
in the way).

So yes, there are clearly two different classes of use cases for
xsd:datetime.  But this does not mean that we should throw out the baby
with the bath.  Both use cases can be acknowledged.

David

On Wed, 2012-08-01 at 10:37 +0100, Steve Harris wrote:
> +1
> 
> We have apps that operate in different timezones, so preserving the
> timezone in results is import.
> 
> - Steve
> 
> On 2012-08-01, at 10:07, Andy Seaborne wrote:
> 
> > The majority of use for RDF for apps I'm involved in at the moment
> are all in the same time place.
> > 
> > Changing the data as it goes through the system, and hence breaking
> the display aspect of the data, is a complete non-starter.  I want to
> know what timezone the dateTime started as.
> > 
> > Display is more important than comparison.
> > 
> > And, from time spent doing support, the first user expectation is
> that stuff that comes out looks like what went in.  Not changing the
> date part sometimes.
> > 
> > "2012-12-31T22:00:00-05:00"^^xsd:dateTime
> > 
> > is the same time point as
> > 
> > "2013-01-01T03:00:00Z"^^xsd:dateTime
> > 
> > It's a different year.
> > 
> >  Andy
> > 
> > On 01/08/12 03:57, Peter F. Patel-Schneider wrote:
> >> 
> >> On 07/31/2012 10:48 PM, David Booth wrote:
> >>> On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote:
> >>>> On 07/31/2012 03:59 PM, David Booth wrote:
> >>>>> Hi Peter,
> >>>>> 
> >>>>> On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
> >>>>>> Hmm.
> >>>>>> 
> >>>>>> Your two examples have different canonical forms in XML.   I do not
> >>>>>> believe
> >>>>>> that going beyond XML canonicalization is a good idea.
> >>>>> What downside do you see?
> >>>> If RDF goes beyond XML canonicalization is it doing something to XML
> >>>> datatypes
> >>>> that is not part of the XML specification.   This appears to be
> >>>> driving a
> >>>> further wedge between RDF and XML data.
> >>> I guess I'm not following what you mean.  For example, the
> >>> xsd:datetimeStamp datatype already requires a timezoneFrag to be
> >>> specified, and one permissible timezoneFrag is "Z" (meaning UTC).  If
> >>> RDF canonicalization suggested that the timezoneFrag always be "Z", what
> >>> wedge would that drive between RDF and XML data?
> >> 
> >> It would say that as far as RDF is concerned, XML data that doesn't use
> >> Z is somehow second class.
> >>> 
> >>>> [...]
> >>>> 
> >>>>>> In any case, I don't see the point here.  If equality-unique
> >>>>>> canonical forms
> >>>>>> are only encouraged, then applications will still have to do
> >>>>>> datatype-aware
> >>>>>> comparisons.
> >>>>> Only if they need to handle all possible data serializations.   If 90%
> >>>>> of the available datasets use the canonical forms then many apps will
> >>>>> not need to do datatype-aware comparisons, though the ones that need to
> >>>>> cover 100% will.
> >>>> If even 99.99% of available datasets use the canonical forms then all
> >>>> apps
> >>>> should still be prepared for non-canonical forms.  To do otherwise is
> >>>> to be
> >>>> wrong.
> >>> It would be wrong for *some* apps, but by no means all.  You can't paint
> >>> all apps with the same brush.  For example, if there are 100 datasets
> >>> available, and 100 apps, and 90 of the datasets use the canonical forms,
> >>> and 40 of the apps only need the datasets that use the canonical forms,
> >>> then that substantially lowers the implementation barrier for those 40
> >>> apps.
> >> As long as these apps only use the 90, and stay away from the 10. This
> >> appears to break one of the prime motivations of RDF, that all data can
> >> be used by anyone.
> >>> 
> >>>> That is not to say that being wrong is not useful on occasion, but I
> >>>> don't see that there is any good to be had here in the WG suggesting
> >>>> canonical
> >>>> forms be used exclusively.
> >>> I just described some substantial good.  I'm not suggesting that
> >>> canonicalization be used *exclusively*, but merely that it be
> >>> *encouraged*, because it does significantly simplify processing when it
> >>> can be used.
> >> 
> >> I don't see the "significantly" here at all.
> >>> 
> >>>>> I think it is important to keep the RDF entry barrier as low as
> >>>>> possible
> >>>>> whenever possible, in order to support scruffy apps that are good
> >>>>> enough
> >>>>> for many purposes, even if they don't handle every case.
> >>>>> 
> >>>>> David
> >>>>> 
> >>>> It is important that apps should do the right thing.  For example,
> >>>> should apps
> >>>> ignore character encoding?  How hard is doing datatype-aware
> >>>> processing of
> >>>> literals, compared with all the rest of the stuff that is required to
> >>>> handle RDF?
> >>> It depends entirely on the application.  In the case of xsd:datetime,
> >>> for example, it means the literal must be completely parsed into its
> >>> year, month, day, hour, minute, seconds and timezone offset, and then
> >>> datetime arithmetic -- which is *not* simple -- must be used to properly
> >>> add the timezone offset in order to compare two values.  All this
> >>> instead of a simple, string comparison!  But the worst part is that the
> >>> application has to *understand* the different datatypes, and this means
> >>> that the code either has to special case every datatype, or it has to
> >>> implement some kind of general datatype-handling framework.  Suddenly,
> >>> an app that could have been a one-off, three-line perl script blows up
> >>> into something that requires significantly more development effort.
> >>> 
> >>> The RDF model is so simple.  It would be nice if it could be processed
> >>> very simply whenever possible.  "Make the simple cases simple", etc.
> >> 
> >> The simplicity of the RDF model is, in my mind, tied up with its
> >> uniformity. Your proposal severely breaks that uniformity, which is a
> >> major lossage.
> >>> 
> >>>> peter
> >>>> 
> >>>> PS:  Yes, I do use text processors to handle RDF, and quite often, even
> >>>> analysing the 2011 Billion Triple Challenge triples using sed and grep.
> >>>> However, I check to ensure that the right thing happens.
> >>> Right, that's exactly the kind of simplified processing that I think we
> >>> should facilitate as often as possible.
> >>> 
> >>> 
> >> Sure, as long as it is only in one-off hacks, controlled by experts, who
> >> can adjust the processing according to the peculiarities of the input.
> >> As soon as direct expert control goes away, then the app needs to be
> >> able to consume all RDF, which I see as counter to your proposal.
> >> 
> >> peter
> >> 
> >> 
> >> 
> >> 
> > 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Wednesday, 1 August 2012 14:01:14 UTC