Re: ISSUE-69 (datatype sizes): datatype sizes

* Richard Cyganiak <richard@cyganiak.de> [2011-09-30 18:19+0100]
> This would force implementations to discard information.

The direct mapping defines the behavior inside a conformance space. As an example of that, SQL 2011 (due in six weeks) Table 9 says SQL only defines the behavior for years between 1 and 9999. Implementers extend that to include years before 1 but users can't expect interop there 'cause they treat the year 0 in at least three different ways.

Without specifying conformance levels, the best we can do is carve out an interop space and encourage extensions to follow the same path with text like:
[[
Extensions to the Direct Mapping should note the spirit of this mapping, i.e. to use a valid representation of an XML Schema Datatype corresponding to the SQL datatype. For numerics, booleans and dates, the canonical XML Schema lexical representation is used
]]


> What is the problem with R2RML's approach here, which is “check your DB manual for the maximum length of a VARCHAR, or else pick an arbitrary large number”?
> 
> > Unfortunately, these mappings are all subject to vender-specific width limitations, which means that implementors have no idea what to implement in order to interoperate with other implementations.
> 
> Why do they have no idea? It says you're supposed to implement whatever the DB vendor says.

This wording provides some advice to implementors who are coding for a particular implementation, but not for users or implementors coding to a generic SQL interface (e.g. DBM, ODBJ, JDBC). There are also limitations on what to expect on the RDF, for instance, XSD says that decimals (and their derived types) are at least 18 digits long, which means that SPARQL, OWL, RDF etc. test cases, and users, are entitled to expect interop on "1234567890123456789012345678"^^xsd:decimal, but not 10 times that value.

The date stuff is pretty complicated, especially when you factor in dates with timezones. (Did you know that in DATETIMEs, seconds range from 0-61.999999¹?) Unless we devote a serious study, we probably won't get everything exactly right now, but at least good enough to clarify expectations and meet a lot of use cases.

¹computed from:
  6.1 6 Table 9: "│ SECOND │ 00 to 61.9(N) where “9(N)” indicates a sequence of N instances of the digit “9” and “N” indicates the number of digits specified by <time fractional seconds precision>. │"
  Annex B 35p: "The maximum value of <time fractional seconds precision> is implementation-defined, but shall not be less than 6."


> Best,
> Richard
> 
> 
> On 30 Sep 2011, at 15:36, RDB2RDF Working Group Issue Tracker wrote:
> 
> > 
> > ISSUE-69 (datatype sizes): datatype sizes
> > 
> > http://www.w3.org/2001/sw/rdb2rdf/track/issues/69
> > 
> > Raised by: Eric Prud'hommeaux
> > On product: 
> > 
> > http://www.w3.org/2001/sw/rdb2rdf/track/issues/48 was resolved by following the SQL spec's lead for generating XML from SQL datatypes. Unfortunately, these mappings are all subject to vender-specific width limitations, which means that implementors have no idea what to implement in order to interoperate with other implementations. The Direct Mapping LC provides fixed numbers for these mappings,
> >  http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110920/#defn-literal_map
> > but these have not been subject to WG review. Justifications for these selections were:
> >  type       width  reason
> >  xsd:decimal  18   http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#decimal says minimum is 18
> >  xsd:integer  18   lexical restriction of decimal doesn't affect width
> >  xsd:double   23   maximum characters for an IEEE754 double (i.e. xsd:double)
> >  xsd:date     13   limited to under 100k years, e.g. +100000-01-01
> >  xsd:time     23   10E-14 second precision, e.g. 01:23:45.67890123456789
> >  xsd:dateTime 37   sizeof(xsd:date)+sizeof("T")+sizeof(xsd:time)
> > 
> > The choices of 10E6 years and 10E-14 seconds are arbitrary. We could limit to common cases like CCYY for years, and microsecond precision for time, noting that extensions to dates should follow the conventions in ISO8601. That would reduce widths to 10 and 15, putting dateTimes as 26.
> > 
> > 
> > 
> > 
> 
> 

-- 
-ericP

Received on Saturday, 1 October 2011 17:19:13 UTC