Re: Encouraging canonical serializations of datatypes in RDF

On Tue, Jul 31, 2012 at 11:31 AM, David Booth <david@dbooth.org> wrote:
> To enable RDF from one system to be more easily compared with RDF from
> another system, it would be helpful if the serialization of datatyped
> literals were encouraged to be in a canonical form that would enable
> simple string comparison to be used instead of requiring a comparator
> that understands the semantics of each datatype.

L2V mapping is not an optional part of RDF. The specifics of
D-entailment may be optionally applied, but the ability to do Lexical
to Value mapping is core to RDF.

>
> A particular case in point: xsd:datetime.
>
>   "2012-07-31T17:16:00+01:00"^^xsd:dateTime
>
> represents the same point in time as
>
>   "2012-07-31T16:16:00Z"^^xsd:dateTime

No, it doesn't. This is a common misunderstanding regarding date
times. The time zone is NOT a meaningless value. xsd:dateTime happily
gets this right in the timezoneCanonicalFragmentMap
http://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap

>
> but the strings are not the same.  This could be avoided by encouraging
> a canonical serialization such as dateTimeStamp
> http://www.w3.org/TR/xmlschema11-2/#dateTimeStamp
> in which the timezoneFrag is required to be "Z".  (I've just filed a
> bugzilla report on XML Datatypes to ask for such a canonicalization
> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18452
> because there doesn't seem to be one defined currently.)
>
> How forcefully such canonicalization should be encouraged is a matter
> for debate.  I do not think it should be a "MUST".  "SHOULD" would be
> fine, as there are good reasons why someone may want to generate
> non-canonical literals.  But it may also be good enough to just put an
> editorial note in the spec saying that "RDF generators are encouraged to
> generate literals in a standard, canonical form that allows simple
> string comparison to test for equality and greater-than/less-than when
> possible".

I would object to either MUST or SHOULD. In may systems preserving the
original lexical form is an important feature. RDF does this well
today and clearly defines lexical space as separate from value space.

The current working group direction is try and specify a canonical
serialization of both a single triple and possibly of a graph as
specific form of N-Triples. Cononicalization doesn't stop with just
datatypes. This should serve the use cases that require
canonicalization well. If there is a specific use case the current WG
direction won't serve please send it along.

--Gavin

>
>
> --
> David Booth, Ph.D.
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not necessarily
> reflect those of his employer.
>
>

Received on Tuesday, 31 July 2012 19:57:32 UTC