Re: DM and R2RML should use same datatype mapping

Eric,

On 3 Nov 2011, at 13:12, Eric Prud'hommeaux wrote:
>> I have not checked whether asking for XSD canonical form would lead to different results from those specced in R2RML, but if it does, I would again say that requiring the version that can be produced directly from the DB is better than requiring another transformation that doesn't really have any benefit.
> 
> Aha, this is perhaps the crux of our different strategies. I believe that the canonical forms are what *everyone* should produce. By specifying the exact lexical form of the terms (as both of us are attempting), we avoid a proliferation of homologous terms in the RDF graphs which come from RDBs. For example, in the RDB-derived graphs, Bob's birth datetime is the same terms as the datetime at which his mother gave birth, not just computationally equivalent (1995-03-18T01:23:45Z vs. 1995-03-18T03:23:45+0200). I think we'd be good RDF datizens if we used the same form that we expect other producers of RDF data to use.

In general I agree that this is desirable, but there is a trade-off between getting the same lexical forms everywhere on the one hand, and the complexity of the mapping on the other hand.

You know all the canonical forms of XSD? Is there a handy table with examples somewhere? Alternatively, could you have a look at the example tables in these sections:
http://www.w3.org/2001/sw/rdb2rdf/r2rml/#datatype-table
http://www.w3.org/2001/sw/rdb2rdf/r2rml/#to-string

and identify anything where the output XSD form or pattern isn't canonical XSD? (Or any interesting cases that are missing from those tables?)

> Are we confident that we can provide recipies for these notations that produce valid (by XSD's metric) forms?

Yes, I'm confident of that. The mapping is based on ISO/IEC 9075-14:2008, which specifies a mapping from SQL values to XSD literals.

Best,
Richard

Received on Thursday, 3 November 2011 13:50:27 UTC