W3C home > Mailing lists > Public > public-rdf-comments@w3.org > August 2012

Re: Encouraging canonical serializations of datatypes in RDF

From: David Booth <david@dbooth.org>
Date: Wed, 01 Aug 2012 15:12:30 -0400
To: Gavin Carothers <gavin@carothers.name>
Cc: public-rdf-comments <public-rdf-comments@w3.org>
Message-ID: <1343848350.2725.87275.camel@dbooth-laptop>
Hi Gavin,

On Tue, 2012-07-31 at 12:57 -0700, Gavin Carothers wrote:
> On Tue, Jul 31, 2012 at 11:31 AM, David Booth <david@dbooth.org> wrote:
[ . . . ]
> > A particular case in point: xsd:datetime.
> >
> >   "2012-07-31T17:16:00+01:00"^^xsd:dateTime
> >
> > represents the same point in time as
> >
> >   "2012-07-31T16:16:00Z"^^xsd:dateTime
> No, it doesn't. This is a common misunderstanding regarding date
> times. The time zone is NOT a meaningless value. xsd:dateTime happily
> gets this right in the timezoneCanonicalFragmentMap
> http://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap

Can you explain?  I just tested the above example using the Perl
DateTime::Format::XSD library (to be sure I hadn't made a silly typo),
and it says that they represent the exact same point in time.  If you
think that library is wrong, I'd like to know why.

> >
> > but the strings are not the same.  This could be avoided by encouraging
> > a canonical serialization such as dateTimeStamp
> > http://www.w3.org/TR/xmlschema11-2/#dateTimeStamp
> > in which the timezoneFrag is required to be "Z".  (I've just filed a
> > bugzilla report on XML Datatypes to ask for such a canonicalization
> > https://www.w3.org/Bugs/Public/show_bug.cgi?id=18452
> > because there doesn't seem to be one defined currently.)
> >
> > How forcefully such canonicalization should be encouraged is a matter
> > for debate.  I do not think it should be a "MUST".  "SHOULD" would be
> > fine, as there are good reasons why someone may want to generate
> > non-canonical literals.  But it may also be good enough to just put an
> > editorial note in the spec saying that "RDF generators are encouraged to
> > generate literals in a standard, canonical form that allows simple
> > string comparison to test for equality and greater-than/less-than when
> > possible".
> I would object to either MUST or SHOULD. In may systems preserving the
> original lexical form is an important feature. 

I agree that preserving the lexical form is important for many
applications, and those should not perform canonicalization.  The
RFC2119 definition of "SHOULD" specifically allows deviation for good
3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

Given this definition, why do you think "SHOULD" would be too strong?

> RDF does this well
> today and clearly defines lexical space as separate from value space.
> The current working group direction is try and specify a canonical
> serialization of both a single triple and possibly of a graph as
> specific form of N-Triples. 

Excellent!  I was not aware of this, but I strongly support the idea.

> Cononicalization doesn't stop with just
> datatypes. 

Agreed.  Datatypes just seemed like the most obvious place to start.

> This should serve the use cases that require
> canonicalization well. If there is a specific use case the current WG
> direction won't serve please send it along.


David Booth, Ph.D.

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Wednesday, 1 August 2012 19:13:04 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:53 UTC