Re: ISSUE-69 (datatype sizes): datatype sizes

* Richard Cyganiak <richard@cyganiak.de> [2011-10-02 14:22+0100]
> -1 to forced discarding of information. Explaining to users why their 38-digit decimal is truncated after 18 digits is going to be a support nightmare. It makes absolutely no sense.

In case you don't understand my intent, I am not trying to force implementers to discard information; I'm trying to document what users of the standard can expect. For example, if I X is an implementation of the Direct Mapping, can I expect to a query to return < 0001-01-01T00:00?

Having no conformance boundry means that users can't expect anything more than 1 byte integers, implementors can't stamp their products and we have no way to write test cases. The behavior you're trying to encourage is that people don't stop at the minimal implementation, that they not exclude data from the graph simply because it is outside of the interop space. I agree with this goal, but want to state it in a way that doesn't erode the utility of having a standard.


> It is true that support for more than 18 digits is optional in XSD implementation. But nevertheless: a SQL database that provides 180 digits of precision for DECIMAL conforms to SQL. An RDF implementation that provides 180 digits of precision for xsd:decimal conforms to RDF and to XSD. A user of these two systems would certainly expect a DM implementation to retain all 180 digits. A system that only retains 18 digits would be perceived as broken. A spec that forces conforming systems to only retain 18 digits *is* broken and this must not happen.
> 
> JDBC and other interfaces allow asking for the vendor and version of the underlying database and hence doing the right thing for any given database engine. Not every DM and R2RML implementer is going to bother, but that's an opportunity for vendors to differentiate themselves. Don't force the spec down to the level of the laziest implementer.

Are implementations *required* to interrogate the datebase name and version in order to trigger vendor-specific behavior? As a user, how much can I expect of the implementer? As an implementer, when can I say I implement the DM.


> The mechanism in R2RML is patterned on what's in the SQL 2008 spec and I expect that they did a serious study. Anyway I trust that the ISO spec is right. I don't have access to the SQL 2011 spec so can't comment on what it says, and anyway the WG resolved that we build on SQL 2008.

Likewise, I don't have a copy of 2008, but I guess there's an 80% chance that my 2011 references below haven't changed since 2008.


> Best,
> Richard
> 
> 
> On 1 Oct 2011, at 18:18, Eric Prud'hommeaux wrote:
> 
> > * Richard Cyganiak <richard@cyganiak.de> [2011-09-30 18:19+0100]
> >> This would force implementations to discard information.
> > 
> > The direct mapping defines the behavior inside a conformance space. As an example of that, SQL 2011 (due in six weeks) Table 9 says SQL only defines the behavior for years between 1 and 9999. Implementers extend that to include years before 1 but users can't expect interop there 'cause they treat the year 0 in at least three different ways.
> > 
> > Without specifying conformance levels, the best we can do is carve out an interop space and encourage extensions to follow the same path with text like:
> > [[
> > Extensions to the Direct Mapping should note the spirit of this mapping, i.e. to use a valid representation of an XML Schema Datatype corresponding to the SQL datatype. For numerics, booleans and dates, the canonical XML Schema lexical representation is used
> > ]]
> > 
> > 
> >> What is the problem with R2RML's approach here, which is “check your DB manual for the maximum length of a VARCHAR, or else pick an arbitrary large number”?
> >> 
> >>> Unfortunately, these mappings are all subject to vender-specific width limitations, which means that implementors have no idea what to implement in order to interoperate with other implementations.
> >> 
> >> Why do they have no idea? It says you're supposed to implement whatever the DB vendor says.
> > 
> > This wording provides some advice to implementors who are coding for a particular implementation, but not for users or implementors coding to a generic SQL interface (e.g. DBM, ODBJ, JDBC). There are also limitations on what to expect on the RDF, for instance, XSD says that decimals (and their derived types) are at least 18 digits long, which means that SPARQL, OWL, RDF etc. test cases, and users, are entitled to expect interop on "1234567890123456789012345678"^^xsd:decimal, but not 10 times that value.
> > 
> > The date stuff is pretty complicated, especially when you factor in dates with timezones. (Did you know that in DATETIMEs, seconds range from 0-61.999999¹?) Unless we devote a serious study, we probably won't get everything exactly right now, but at least good enough to clarify expectations and meet a lot of use cases.
> > 
> > ¹computed from:
> >  6.1 6 Table 9: "│ SECOND │ 00 to 61.9(N) where “9(N)” indicates a sequence of N instances of the digit “9” and “N” indicates the number of digits specified by <time fractional seconds precision>. │"
> >  Annex B 35p: "The maximum value of <time fractional seconds precision> is implementation-defined, but shall not be less than 6."
> > 
> > 
> >> Best,
> >> Richard
> >> 
> >> 
> >> On 30 Sep 2011, at 15:36, RDB2RDF Working Group Issue Tracker wrote:
> >> 
> >>> 
> >>> ISSUE-69 (datatype sizes): datatype sizes
> >>> 
> >>> http://www.w3.org/2001/sw/rdb2rdf/track/issues/69
> >>> 
> >>> Raised by: Eric Prud'hommeaux
> >>> On product: 
> >>> 
> >>> http://www.w3.org/2001/sw/rdb2rdf/track/issues/48 was resolved by following the SQL spec's lead for generating XML from SQL datatypes. Unfortunately, these mappings are all subject to vender-specific width limitations, which means that implementors have no idea what to implement in order to interoperate with other implementations. The Direct Mapping LC provides fixed numbers for these mappings,
> >>> http://www.w3.org/TR/2011/WD-rdb-direct-mapping-20110920/#defn-literal_map
> >>> but these have not been subject to WG review. Justifications for these selections were:
> >>> type       width  reason
> >>> xsd:decimal  18   http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#decimal says minimum is 18
> >>> xsd:integer  18   lexical restriction of decimal doesn't affect width
> >>> xsd:double   23   maximum characters for an IEEE754 double (i.e. xsd:double)
> >>> xsd:date     13   limited to under 100k years, e.g. +100000-01-01
> >>> xsd:time     23   10E-14 second precision, e.g. 01:23:45.67890123456789
> >>> xsd:dateTime 37   sizeof(xsd:date)+sizeof("T")+sizeof(xsd:time)
> >>> 
> >>> The choices of 10E6 years and 10E-14 seconds are arbitrary. We could limit to common cases like CCYY for years, and microsecond precision for time, noting that extensions to dates should follow the conventions in ISO8601. That would reduce widths to 10 and 15, putting dateTimes as 26.
> >>> 
> >>> 
> >>> 
> >>> 
> >> 
> >> 
> > 
> > -- 
> > -ericP
> > 
> 

-- 
-ericP

Received on Sunday, 2 October 2011 16:20:46 UTC