Re: Encouraging canonical serializations of datatypes in RDF from Peter F. Patel-Schneider on 2012-08-01 (public-rdf-comments@w3.org from August 2012)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 31 Jul 2012 22:57:38 -0400
To: David Booth <david@dbooth.org>
CC: public-rdf-comments <public-rdf-comments@w3.org>
Message-ID: <50189B22.5090805@gmail.com>
On 07/31/2012 10:48 PM, David Booth wrote:
> On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote:
>> On 07/31/2012 03:59 PM, David Booth wrote:
>>> Hi Peter,
>>>
>>> On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
>>>> Hmm.
>>>>
>>>> Your two examples have different canonical forms in XML.   I do not believe
>>>> that going beyond XML canonicalization is a good idea.
>>> What downside do you see?
>> If RDF goes beyond XML canonicalization is it doing something to XML datatypes
>> that is not part of the XML specification.   This appears to be driving a
>> further wedge between RDF and XML data.
> I guess I'm not following what you mean.  For example, the
> xsd:datetimeStamp datatype already requires a timezoneFrag to be
> specified, and one permissible timezoneFrag is "Z" (meaning UTC).  If
> RDF canonicalization suggested that the timezoneFrag always be "Z", what
> wedge would that drive between RDF and XML data?

It would say that as far as RDF is concerned, XML data that doesn't use Z is 
somehow second class.
>
>> [...]
>>
>>>> In any case, I don't see the point here.  If equality-unique canonical forms
>>>> are only encouraged, then applications will still have to do datatype-aware
>>>> comparisons.
>>> Only if they need to handle all possible data serializations.   If 90%
>>> of the available datasets use the canonical forms then many apps will
>>> not need to do datatype-aware comparisons, though the ones that need to
>>> cover 100% will.
>> If even 99.99% of available datasets use the canonical forms then all apps
>> should still be prepared for non-canonical forms.  To do otherwise is to be
>> wrong.
> It would be wrong for *some* apps, but by no means all.  You can't paint
> all apps with the same brush.  For example, if there are 100 datasets
> available, and 100 apps, and 90 of the datasets use the canonical forms,
> and 40 of the apps only need the datasets that use the canonical forms,
> then that substantially lowers the implementation barrier for those 40
> apps.
As long as these apps only use the 90, and stay away from the 10. This appears 
to break one of the prime motivations of RDF, that all data can be used by anyone.
>
>> That is not to say that being wrong is not useful on occasion, but I
>> don't see that there is any good to be had here in the WG suggesting canonical
>> forms be used exclusively.
> I just described some substantial good.  I'm not suggesting that
> canonicalization be used *exclusively*, but merely that it be
> *encouraged*, because it does significantly simplify processing when it
> can be used.

I don't see the "significantly" here at all.
>
>>> I think it is important to keep the RDF entry barrier as low as possible
>>> whenever possible, in order to support scruffy apps that are good enough
>>> for many purposes, even if they don't handle every case.
>>>
>>> David
>>>
>> It is important that apps should do the right thing.  For example, should apps
>> ignore character encoding?  How hard is doing datatype-aware processing of
>> literals, compared with all the rest of the stuff that is required to handle RDF?
> It depends entirely on the application.  In the case of xsd:datetime,
> for example, it means the literal must be completely parsed into its
> year, month, day, hour, minute, seconds and timezone offset, and then
> datetime arithmetic -- which is *not* simple -- must be used to properly
> add the timezone offset in order to compare two values.  All this
> instead of a simple, string comparison!  But the worst part is that the
> application has to *understand* the different datatypes, and this means
> that the code either has to special case every datatype, or it has to
> implement some kind of general datatype-handling framework.  Suddenly,
> an app that could have been a one-off, three-line perl script blows up
> into something that requires significantly more development effort.
>
> The RDF model is so simple.  It would be nice if it could be processed
> very simply whenever possible.  "Make the simple cases simple", etc.

The simplicity of the RDF model is, in my mind, tied up with its uniformity.  
Your proposal severely breaks that uniformity, which is a major lossage.
>
>> peter
>>
>> PS:  Yes, I do use text processors to handle RDF, and quite often, even
>> analysing the 2011 Billion Triple Challenge triples using sed and grep.
>> However, I check to ensure that the right thing happens.
> Right, that's exactly the kind of simplified processing that I think we
> should facilitate as often as possible.
>
>
Sure, as long as it is only in one-off hacks, controlled by experts, who can 
adjust the processing according to the peculiarities of the input.  As soon as 
direct expert control goes away, then the app needs to be able to consume all 
RDF, which I see as counter to your proposal.

peter
Received on Wednesday, 1 August 2012 02:58:06 UTC