Re: DM and R2RML should use same datatype mapping

* Richard Cyganiak <richard@cyganiak.de> [2011-10-27 11:27+0100]
> This is a Last Call comment on the Direct Mapping and R2RML specifications:
> 
> http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110324/
> http://www.w3.org/TR/2011/WD-r2rml-20110920/
> 
> 
> Both specifications define a mapping from SQL datatyped values to RDF literals.
> 
> http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110920/#defn-literal_map
> http://www.w3.org/TR/2011/WD-r2rml-20110920/#datatype-conversions
> 
> The two mappings differ in various details. Given that the requirements for both mappings are the same, this places undue burden on implementers that plan to implement both the Direct Mapping and R2RML.
> 
> Therefore, both specifications should use the same mapping.
> 
> I note that the mapping in R2RML is based on the SQL-to-XML mapping in ISO/IEC 9075-14:2008, and covers more of SQL 2008 than the mapping in the DM.
> 
> I therefore propose that the mapping from the R2RML specification is used in both documents, with the DM specification using a normative reference to the R2RML spec.

The current text in DM is:
[[
The values in a row are mapped to RDF literals. The Direct Graph is defined for a set of typed values which are defined in minimally-conformant SQL processors and expressible in minimally-conformant XML Schema Datatypes
implementations. The literal map provides mapping algorithms for these datatypes:

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│Definition literal map: a mapping from an SQL value with a datatype to:                                                                 │
│                                                                                                                                        │
│  * for the SQL datatypes CHAR, VARCHAR and STRING, a Plain literal with the lexical value of the SQL value.                            │
│  * for the SQL datatypes listed in this table, a Typed literal with the this datatype and lexical form:                                │
│                                                                                                                                        │
│                   SQL datatype                   RDF datatype                                Lexical form                              │
│    BINARY, BINARY VARYING, BINARY LARGE OBJECT xsd:base64Binary XML Schema base64 encoding of value                                    │
│    NUMERIC, DECIMAL                            xsd:decimal      SQL result of: CAST(value AS CHARACTER VARYING(18))                    │
│    SMALLINT, INTEGER, BIGINT                   xsd:integer      SQL result of: CAST(value AS CHARACTER VARYING(18))                    │
│    FLOAT, REAL, DOUBLE PRECISION               xsd:double       SQL result of: CAST(value AS CHARACTER VARYING(23))                    │
│    BOOLEAN                                     xsd:boolean      SQL result of: IF (value, 'true', 'false')                             │
│    DATE                                        xsd:date         SQL result of: CAST(value AS CHARACTER VARYING(13))                    │
│    TIME                                        xsd:time         SQL result of: CAST(value AS CHARACTER VARYING(23))                    │
│    TIMESTAMP                                   xsd:dateTime     SQL result of: REPLACE(CAST(value AS CHARACTER VARYING(37)), " ", "T") │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Extensions to the Direct Mapping should note the spirit of this mapping, i.e. to use a valid representation of an XML Schema Datatype corresponding to the SQL datatype. For numerics, booleans and dates, the canonical XML Schema
lexical representation is used. Extensions are likely to map data outside of the minimal SQL conformance into data types with higher precision than those specified by the literal map.
]] — http://www.w3.org/2001/sw/rdb2rdf/directMapping/LC/Overview.html#minimal-DG

The DM and R2RML differ principally in that the DM asserts a finite (datatype) domain (18 digit integers, IEEE754 doubles, etc.) while R2RML leaves the domain dependent on the database being queried. A tool which uses e.g. floats or ints to manipulate the graph defined by R2RML would have to qualify its conformance by the version of the database to which it was connected (e.g. "offers R2RML for MySQL 5.01, but not Oracle 11G"). General compatibility with R2RML over any database can only be preserved if you don't use native types at any step of the e.g. query answering process. Applying the unbounded precision support to DM would mean that FeDeRate would no longer be an implementation (it uses Jena to parse and execute queries which I believe uses java native types) and SWObjects would have an even harder time as it is intended to connect multiple databases with potentially different maximum precisions.

In <http://www.w3.org/mid/20111011150033.GA10078@w3.org>, I proposed to simplify this to use "the canonical XML Schema lexical representation of this domain:…". I also propose that we ditch the lexical recipes as pretty much no one will use them anyways, give that native drivers give either native datatype or lexical values for queried attributes:

[[
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│Definition literal map: a mapping from an SQL value with a datatype to:                                                       │
│                                                                                                                              │
│  * for the SQL datatypes CHAR, VARCHAR and STRING, a Plain literal with the lexical value of the SQL value.                  │
│  * for BINARY, BINARY VARYING, BINARY LARGE OBJECT, a xsd:base64Binary with the XML Schema base64 encoding of the SQL value. │
│  * for BOOLEAN, a xsd:boolean with a lexical value of 'true' or 'false'.                                                     │
│  * for the SQL datatypes listed in this table, a Typed literal with the this datatype and the canonical lexical form of:     │
│                                                                                                                              │
│        SQL datatype                                       Value range                                RDF datatype            │
│    NUMERIC, DECIMAL                  -10^18 to 10^18                                                 xsd:decimal             │
│    SMALLINT, INTEGER, BIGINT         -10^18 to 10^18                                                 xsd:integer             │
│    FLOAT, REAL, DOUBLE PRECISION     IEEE754 double                                                  xsd:double              │
│    DATE                              0001-01-01 to 9999-12-31                                        xsd:date                │
│    TIME                              00:00:00-14:00 to 23:59:59.9999+14:00                           xsd:time                │
│    TIMESTAMP                         0001-01-01T00:00:00-14:00 to 9999-11-31T23:59:59.9999+14:00     xsd:dateTime            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
]]

(BTW, we don't necessarily need to limit the early dates to 0001-01-01. While SQL doesn't demand a representation for the year 0, XML Schema does.)

thoughts?


> Best,
> Richard

-- 
-ericP

Received on Sunday, 30 October 2011 23:12:13 UTC