ISSUE-5: n-ary datatypes - rounding errors

7) rounding errors behave very differently from in traditional numeric
applications - hence a solved problem (rounding) becomes an unsolved problem

(I will also add another area of concern which is a mismatch between the 
  real numbers and the XSD datatypes)

===

Looking at the racer documentation, racer seems to work with real 
numbers - however OWL works with the XSD variation which approximate 
real numbers in several standard ways, for example IEEE floating point 
numbers, in 32 (or 64) bits.

When thinking about arithmetic in a knowledge representation form, we 
may convert miles to kilometers by multiplying by 1.6.

We may think of such a conversion as a bijection.

However, if we are representing miles and kilometers by xsd:float (as in 
say the Pan and Horrocks paper), this all falls to the floor in an ugly 
mess.

The largest float value for miles has no corresponding kilometers value, 
similarly, the smallest kilometer value, has no corresponding mile 
value. A little bit of thought shows that either the conversion is 
sparse with many values being unconvertible (because they do not 
correspond to exact values), or the conversion is not one-to-one but 
many-to-one (or is it one-to-many) because the approximative nature of 
the intervals map more than one kilometer value to the same mile value.

In this way, many of our intuitions fail; and the underlying logic of 
the reasoner, also fails, in ways that confuse intelligent people who 
understand the area very well (I think the Turner and Carroll critique 
of the Pan and Horrocks mile-to-kilometer conversion illustrates this). 
This is likely to be very confusing for the end users.

If we convert from one unit to another, and then to a third, we also 
need to consider associativity.
IEEE arithmetic has non-associative multiplication, which gives problems.


In summary, IEEE arithmetic is designed for procedural purposes, and not 
declarative ones. Doing declarative arithmetic, in a web context, in 
which interaction with legacy systems, such as databases, which use IEEE 
formats etc, is non-trivial.


Jeremy

PS

As an appendix to this e-mail, I include the text of section 4.2 of 
Turner and Carroll (this is all Dave's work).


=======

An example from [Pan and Horrocks] declares that the Yangtze river is 
3937.5 miles long and uses the kmtrsPerMile predicate to deduce that it 
is also 6300.0km long. In other words,
  ( 6300.0, 3937.5 )  in [kmtrsPerMile].

This example uses the XML Schema datatype float to represent lengths. 
Suppose that the Yangtze was declared instead to be 3937.501 miles long, 
then

  ( 6300.0015, 3937.501 )  in [kmtrsPerMile]

so the Yangtze river may be deduced to be 6300.0015km long. However,

  ( 6300.0015, 3937.5007 ) in  [kmtrsPerMile]

so that the Yangtze river may also be deduced to be 3937.5007 miles 
long. This would be inconsistent with the user’s expectation that a 
river has only one length.

As pointed out in [Pan and Horrocks], there are well over a hundred 
length units, and rounding errors caused by round-tripping values 
through all of the associated conversions can accumulate into 
significant errors. We implemented a system to do conversions between 
floats representing lengths in kilometers, meters, centimeters, 
millimeters, micrometers, inches, feet, yards, fathoms, poles, chains, 
furlongs, statute miles, leagues and nautical miles and deduced the 
length of the Yangtze to be both 6335.3584km and 6361.8555km1, and 
nearly 800,000 other values, starting from a declaration that its length 
in miles is 3937.5. These rounding errors were highly dependent on the 
structure of the definitions of the units, as multiplication in float is 
not associative so scalar multiplication operators on float do not 
commute. This lack of associativity also demonstrates that the 
(necessarily associative) composition of two datatypes like kmtrsPerMile 
and, say, milesPerLeague cannot be the same as the composition of the 
underlying arithmetic operations; again, this is likely to be 
inconsistent with a user’s expectations.

In short, the behaviour of fixed-precision floating-point datatypes with 
arithmetic in OWL is likely to be a source of confusion amongst users. 
Additionally, suppose the Volga river were declared to be 3668.8003km 
long, then it would have no value for its lengthInMile property at all, 
since

   ( 3668.8000, 2293.0000 ) in [kmtrsPerMile]
   ( 3668.8005, 2293.0002 ) in [kmtrsPerMile]
   not exist x in float with 2293.0 < x < 2293.0002

Again, this situation would be contrary to the user’s expectation that 
one can always convert freely (albeit possibly inaccurately) between 
miles and kilometers. Notice that this cannot be remedied by using the 
arbitrary-precision decimal instead of the fixed-precision float: for 
example the temperature 75.0F has no corresponding decimal 
representation in C.

In practice, many applications do not require the declarative style of 
arithmetic that datatypes like kmtrsPerMile would allow. Instead, a 
procedural approach is adequate. For example, a user may be happy that 
the Volga can be deduced to be 2293.0km long, and may be equally happy 
with 2293.0002km, as long as only one of the options is chosen. One 
method that has been used to achieve this would be to embed conversion 
instructions as literals in an ontology[9], which makes it clear to a 
user that the semantics of arithmetic is separated from that of the DL.

Received on Tuesday, 13 November 2007 14:45:28 UTC