Encouraging canonical serializations of datatypes in RDF

To enable RDF from one system to be more easily compared with RDF from
another system, it would be helpful if the serialization of datatyped
literals were encouraged to be in a canonical form that would enable
simple string comparison to be used instead of requiring a comparator
that understands the semantics of each datatype.

A particular case in point: xsd:datetime.   

  "2012-07-31T17:16:00+01:00"^^xsd:dateTime

represents the same point in time as

  "2012-07-31T16:16:00Z"^^xsd:dateTime

but the strings are not the same.  This could be avoided by encouraging
a canonical serialization such as dateTimeStamp
http://www.w3.org/TR/xmlschema11-2/#dateTimeStamp
in which the timezoneFrag is required to be "Z".  (I've just filed a
bugzilla report on XML Datatypes to ask for such a canonicalization
https://www.w3.org/Bugs/Public/show_bug.cgi?id=18452 
because there doesn't seem to be one defined currently.)

How forcefully such canonicalization should be encouraged is a matter
for debate.  I do not think it should be a "MUST".  "SHOULD" would be
fine, as there are good reasons why someone may want to generate
non-canonical literals.  But it may also be good enough to just put an
editorial note in the spec saying that "RDF generators are encouraged to
generate literals in a standard, canonical form that allows simple
string comparison to test for equality and greater-than/less-than when
possible".  


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Received on Tuesday, 31 July 2012 18:32:14 UTC