- From: Gavin Carothers <gavin@carothers.name>
- Date: Tue, 31 Jul 2012 12:57:05 -0700
- To: David Booth <david@dbooth.org>
- Cc: public-rdf-comments <public-rdf-comments@w3.org>
On Tue, Jul 31, 2012 at 11:31 AM, David Booth <david@dbooth.org> wrote: > To enable RDF from one system to be more easily compared with RDF from > another system, it would be helpful if the serialization of datatyped > literals were encouraged to be in a canonical form that would enable > simple string comparison to be used instead of requiring a comparator > that understands the semantics of each datatype. L2V mapping is not an optional part of RDF. The specifics of D-entailment may be optionally applied, but the ability to do Lexical to Value mapping is core to RDF. > > A particular case in point: xsd:datetime. > > "2012-07-31T17:16:00+01:00"^^xsd:dateTime > > represents the same point in time as > > "2012-07-31T16:16:00Z"^^xsd:dateTime No, it doesn't. This is a common misunderstanding regarding date times. The time zone is NOT a meaningless value. xsd:dateTime happily gets this right in the timezoneCanonicalFragmentMap http://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap > > but the strings are not the same. This could be avoided by encouraging > a canonical serialization such as dateTimeStamp > http://www.w3.org/TR/xmlschema11-2/#dateTimeStamp > in which the timezoneFrag is required to be "Z". (I've just filed a > bugzilla report on XML Datatypes to ask for such a canonicalization > https://www.w3.org/Bugs/Public/show_bug.cgi?id=18452 > because there doesn't seem to be one defined currently.) > > How forcefully such canonicalization should be encouraged is a matter > for debate. I do not think it should be a "MUST". "SHOULD" would be > fine, as there are good reasons why someone may want to generate > non-canonical literals. But it may also be good enough to just put an > editorial note in the spec saying that "RDF generators are encouraged to > generate literals in a standard, canonical form that allows simple > string comparison to test for equality and greater-than/less-than when > possible". I would object to either MUST or SHOULD. In may systems preserving the original lexical form is an important feature. RDF does this well today and clearly defines lexical space as separate from value space. The current working group direction is try and specify a canonical serialization of both a single triple and possibly of a graph as specific form of N-Triples. Cononicalization doesn't stop with just datatypes. This should serve the use cases that require canonicalization well. If there is a specific use case the current WG direction won't serve please send it along. --Gavin > > > -- > David Booth, Ph.D. > http://dbooth.org/ > > Opinions expressed herein are those of the author and do not necessarily > reflect those of his employer. > >
Received on Tuesday, 31 July 2012 19:57:32 UTC