Re: Request for comments: Breaking down the datatype mapping problem (ISSUE-69)

Hi David,

A few comments, and questions for clarification, inline.

On 21 Nov 2011, at 15:48, David McNeil wrote:
>> == Should data be C14N'd before used in IRI generation? ==
> 
> There must be a way to make IRIs canonical. Seems that this can be done either implicitly (the R2RML processor does it automatically) or explicitly (by providing some kind of canonicalize function for the user to invoke).

The proposed implicit canonicalization is defined as just invoking CAST(xxx AS VARCHAR). Mapping authors can invoke this SQL function themselves in an R2RML view, or use any other functions to construct the string they'd like to have, so the option of canonicalizing explicitly is available in R2RML.

Note however that this isn't the case in the DM. If a non-string column (let's say, a TIMESTAMP) is used in a PK, then its values will end up in the entity IRIs without a possibility for explicit user intervention.

Would you say that in this situation, the value MUST be canonicalized while building the IRI in the DM?

>> == How should datatype overrides be handled? ==
>>   
>> The spec currently says that the canonical string value from SQL is directly used, without mapping to an XSD form.
> 
> Are there cases when the canonical SQL string value does not produce a valid XSD value for the target XSD type? If so, then it does not seem acceptable to just use the canonical SQL string value.

The canonical SQL string is never used directly, but first turned into a *natural RDF literal*, which involves a transformation in the case of booleans, datetimes and binary types. The resulting typed literals are always well-typed. If not, it would be a bug in the spec, regardless of this canonicalization discussion.

This is a non-issue when the canonical string is used to generate IRIs, as there is no expectation that they are formatted according to any particular XSD type.

>> == Should canonical forms be SQL canonical or XSD canonical? ==
> 
> We do not think that R2RML needs to define canonical forms for the output values.

I follow your logic for R2RML, although I don't think that I agree.

Nevertheless, I think the question still stands for the DM: Should it be SQL or XSD?

>> == Should unknown vendor-specific types be mapped to plain literals? ==
> 
> It seems fine for these to be plain literals by default, but we can't mandate that because an implementer may provide a way to map a user defined type to something other than a plain literal.

The current approach is to have a line in this table here:
http://www.w3.org/2001/sw/rdb2rdf/r2rml/#natural-value-mapping

[[
Any other supported SQL datatype => (plain literal)
]]

And right under the table it says:

[[
Note: R2RML processor implementations are expected to augment the table with additional rows for mapping vendor-specific datatypes to appropriate RDF-compatible datatypes, like the XML Schema built-in types.
]]

Is this an acceptable phrasing?

Best,
Richard

Received on Monday, 21 November 2011 18:29:37 UTC