Re: DM and R2RML should use same datatype mapping from Richard Cyganiak on 2011-11-02 (public-rdb2rdf-comments@w3.org from November 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 2 Nov 2011 17:53:20 +0000
To: Eric Prud'hommeaux <eric@w3.org>
Cc: public-rdb2rdf-comments@w3.org
Message-Id: <F2641A03-F924-458F-978F-BD321D5E42A1@cyganiak.de>
Eric,

Reading up more on these issues, I note that I'm probably mistaken about Java native types being insufficient to implement minimally-conforming xsd:decimal and xsd:integer.

As you know, XSD has a section on partial implementations of the infinite datatypes, and defines lower limits for a “minimally conforming processor” [1].

XSD requires 16 digits of xsd:decimal precision for a minimally conforming processor; and a quick reading of documentation on IEEE 754 doubles seems to indicate that they provide 15.95 digits of precision. I'm not sure if this is the same notion of precision, but I'm willing to assume that the XSD people did their homework and set this specific boundary in order to allow support using IEEE 754 doubles.

Since xsd:integer is defined as a subtype of xsd:decimal, I take it that it has to support everything in the range of ±10^16. The 64-bit long native type of Java is sufficient for that.

Neither the DM nor R2RML override the lower limits set by the XSD spec (although they could). It follows that even if the range of the R2RML and DM mapping functions goes beyond the range of Java's long and double types (as it has to in order to cover SQL 2008), an R2RML or DM implementer can simply declare their implementation to be a minimally conforming XSD processor and still claim R2RML/DM conformance. They just have to abide by the rules set out in [1] – no silent truncation, clearly document the limits, etc.

I believe that stating such limits in the documentation of individual implementations is *much* saner than setting an arbitrary normative limit for all implementations in the spec text of R2RML and DM. After all, the documentation of the implementation is the first place where users will look, and this allows implementations to differentiate themselves by doing the Right Thing for a larger range of inputs.

Best,
Richard

[1] http://www.w3.org/TR/xmlschema11-2/#partial-implementation


On 31 Oct 2011, at 03:39, Eric Prud'hommeaux wrote:

> * Richard Cyganiak <richard@cyganiak.de> [2011-10-31 00:09+0000]
>> On 30 Oct 2011, at 23:11, Eric Prud'hommeaux wrote:
>>> A tool which uses e.g. floats or ints to manipulate the graph defined by R2RML would have to qualify its conformance by the version of the database to which it was connected (e.g. "offers R2RML for MySQL 5.01, but not Oracle 11G").
>> 
>> Neither floats nor ints are sufficient to represent xsd:decimal even if we consider only xsd:decimals restricted to 18 digits.
> 
> True, and that does raise the bar for implementation. However, floating point and integer types are very commonly used in SQL and can be very simply implemented.
> 
> 
>> Any programming language these days has some sort of arbitrary-precision decimal type in a readily available library. That is sufficient for conformance with any SQL 2008 conforming implementation of DECIMAL, regardless of how many digits it uses.
>> 
>>> General compatibility with R2RML over any database can only be preserved if you don't use native types at any step of the e.g. query answering process.
>> 
>> I have no idea what you're trying to say here.
> 
> As you point out above, one needs to use arbitrary-precision decimals and not native datatypes to implement the arbitrary precision required by R2RML.
> Some programs, e.g. Jena, use efficient native types for integers and arbitrary-precision only for decimals.
> 
> 
>>> Applying the unbounded precision support to DM would mean that FeDeRate would no longer be an implementation (it uses Jena to parse and execute queries which I believe uses java native types)
>> 
>> You may want to check that again. Jena uses BigDecimal to represent xsd:decimal.
> 
> The query
>  ASK {FILTER (20000000000000000000/2=10000000000000000000)}
> at <http://sparql.org/sparql.html> indicates that ARC supports up to, but no more than, 18 digit integers.
> 
> 
>>> and SWObjects would have an even harder time as it is intended to connect multiple databases with potentially different maximum precisions.
>> 
>> I don't understand the problem. When you query the DB you get back some value. Then you stuff that value into a BigDecimal.
>> 
>> I don't understand how knowing that you're never going to see a decimal longer than 18 digits simplifies an implementation. It's not like it's particularly hard to write arbitrary-precision code.
> 
> True, but do the use cases motivate raising the bar to that extent? Can we motivate Jena abandoning native integers?
> 
> 
>> As far as I can see, the text in R2RML works fine, is easy to implement, easy to test, and meets user expectations. I have seen no evidence yet that changing the text would benefit users or implementers, and I have seen no argument being made why R2RML and DM should differ. As far as I can tell, you're trying to solve an imaginary problem.
> 
> I don't foresee many implementations of arbitrary precision for integers and floats and I don't see much motivation for that. Further, it makes more sense to define the lexical values in terms of the XSD canonical types rather than via a recipe which some popular databases (e.g. MySQL) don't support. 
> 
> 
>> Best,
>> Richard
> 
> -- 
> -ericP
>
Received on Wednesday, 2 November 2011 18:01:55 UTC