Re: Encouraging canonical serializations of datatypes in RDF

Hi Peter,

On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote:
> Hmm.
> 
> Your two examples have different canonical forms in XML.   I do not believe 
> that going beyond XML canonicalization is a good idea.

What downside do you see?

> 
> I find your request for a second canonical form for XML datetime to be rather 
> strange.  Isn't it a major point of a canonical form that there is only one?

Yes, I realize that it sounds a little strange.  But the fact is that
there are distinctly different use cases that require different
canonical forms.  In one set of use cases, timezone provenance is
important and wanted, and in another set of use cases it is not
important and not wanted.

> 
> In any case, I don't see the point here.  If equality-unique canonical forms 
> are only encouraged, then applications will still have to do datatype-aware 
> comparisons.

Only if they need to handle all possible data serializations.   If 90%
of the available datasets use the canonical forms then many apps will
not need to do datatype-aware comparisons, though the ones that need to
cover 100% will.

I think it is important to keep the RDF entry barrier as low as possible
whenever possible, in order to support scruffy apps that are good enough
for many purposes, even if they don't handle every case.

David

> 
> peter
> 
> 
> On 07/31/2012 02:31 PM, David Booth wrote:
> > To enable RDF from one system to be more easily compared with RDF from
> > another system, it would be helpful if the serialization of datatyped
> > literals were encouraged to be in a canonical form that would enable
> > simple string comparison to be used instead of requiring a comparator
> > that understands the semantics of each datatype.
> >
> > A particular case in point: xsd:datetime.
> >
> >    "2012-07-31T17:16:00+01:00"^^xsd:dateTime
> >
> > represents the same point in time as
> >
> >    "2012-07-31T16:16:00Z"^^xsd:dateTime
> >
> > but the strings are not the same.  This could be avoided by encouraging
> > a canonical serialization such as dateTimeStamp
> > http://www.w3.org/TR/xmlschema11-2/#dateTimeStamp
> > in which the timezoneFrag is required to be "Z".  (I've just filed a
> > bugzilla report on XML Datatypes to ask for such a canonicalization
> > https://www.w3.org/Bugs/Public/show_bug.cgi?id=18452
> > because there doesn't seem to be one defined currently.)
> >
> > How forcefully such canonicalization should be encouraged is a matter
> > for debate.  I do not think it should be a "MUST".  "SHOULD" would be
> > fine, as there are good reasons why someone may want to generate
> > non-canonical literals.  But it may also be good enough to just put an
> > editorial note in the spec saying that "RDF generators are encouraged to
> > generate literals in a standard, canonical form that allows simple
> > string comparison to test for equality and greater-than/less-than when
> > possible".
> >
> >
> 
> 
> 
> 
> 

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Received on Tuesday, 31 July 2012 20:00:13 UTC