XML Schema datatypes

On Tue, 13 Nov 2007, Jeremy Carroll wrote:
>
> 7) rounding errors behave very differently from in traditional numeric
> applications - hence a solved problem (rounding) becomes an unsolved problem
>
> (I will also add another area of concern which is a mismatch between the 
> real numbers and the XSD datatypes)
>
> ===

This is an interesting subject. I don't think it is really about
"rounding", though, and neither is it really about n-ary datatypes.  I
think what your example shows is that using bounded or fixed-precision
datatypes such as float or decimal in the semantics of an ontology
language is a bad idea. This means that already the unary datatypes in
OWL 1.0 are in some sense broken, and the XML Schema datatypes are
probably *not* the right thing to use in OWL. Sorry if this sounds like
blasphemy.

Let me explain in more detail what I mean, singling out a number of
subjects.

----------------------------------------------------------------------

1. Not about rounding

Take your example

> An example from [Pan and Horrocks] declares that the Yangtze river is 3937.5 
> miles long and uses the kmtrsPerMile predicate to deduce that it is also 
> 6300.0km long. In other words,
> ( 6300.0, 3937.5 )  in [kmtrsPerMile].
>
> This example uses the XML Schema datatype float to represent lengths. Suppose 
> that the Yangtze was declared instead to be 3937.501 miles long, then
>
> ( 6300.0015, 3937.501 )  in [kmtrsPerMile]
>
> so the Yangtze river may be deduced to be 6300.0015km long. However,
>
> ( 6300.0015, 3937.5007 ) in  [kmtrsPerMile]
>
> so that the Yangtze river may also be deduced to be 3937.5007 miles long. 
> This would be inconsistent with the user?s expectation that a river has only 
> one length.

I think what happens here is indeed bad, but it is something different
from what is claimed. Namely, as far as I can see, rounding is not
explicitly addressed in the semantics of OWL1.0 and in XML Schema, and
neither in the proposed semantics of OWL 1.1 (correct me if I'm
wrong). Thus, rounding does *not take place*. Instead, there is a
"gap" in the datatype. If reasoners do rounding, they actually violate
the semantics.

Let's make your example a bit more precise. Assume Yangtze is an
individual and

a I use a unary datatype predicate "=3937.501" on the datatype
   property "lengthInMiles", i.e., is say that Yangtze is connected
   to the concrete object 3937.501 via "lenthinMiles"

b Now I use a binary predicate datatypre predicte "milesToKm" on
   Yantze, with first argument "lengthInMiles" and second argument
   "lengthInKm". I do this using a dataPropertyAssertion, which has
   an existential semantics.

Thus, b stipulates that there *is* a float that corresponds to 3937.501
in kms, but in fact there isn't (be cause we would need to do rounding
to make it a legal float, which we don't). What do we get? An
inconsistency. This is different from, but not much better than, two
different values for the length.

----------------------------------------------------------------------

2. Not about n-ary datatypes

The point of your example above is that the fixed precision of float
produces unexpected results, namely an inconsistency. This already
happens with unary datatypes, and even with those that are definable
in XML Schema.

Take e.g. float. Let n be the largest float that is smaller than 1.
Such a float exists since there are only finitely many floats. Now
assume that a user defines two unary predices, one that is true for
all floats strictly greater than n, and one for all floats strictly
smaller than 1. Asserting the existence of a data value in the
intersection of these two predicates leads to an inconsistency.

But this is not what a user excepts since (s)he is working with an
ontology, i.e., doing a *conceptual modelling*, so she should abstract
away from details such as representation of numbers and think of floats
as rationals or reals. These are dense, so the user expects the above
intersection to be non-emoty.

So, we get unexpected results already with unary datatypes. The issue
here is not the arity. It is the boundedness / fixed precision of XML
Schema datatypes.

----------------------------------------------------------------------

3. XML Schema is not a good choice for defining datatypes.

XML Schema is a schema language for XML, i.e., it describes
semi-structured data stored in the form of an XML document.
It's very good for that purpose, because if you store data,
it is important to consider the details of storage, which
usually involves boundedness, fixed precision, and rounding.

*We* are *not* defining a schema language for (stored) data in the
sense of XML Schema. So it is a valid question whether or not the XML
Schema datatypes are also good for our (different) purposes. I believe
they are not.

We are defining an ontology language with a declarative semantics. As
the above examples show (and there are tons more), we get all sorts of
oddities from the combination of (i) an expressive logic that has a
declarative semantics and (ii) the bounded datatypes of XML schema. As
the literature on concrete domains in description logics shows, there
are no such problems if you work with unbounded datatypes such as the
integers or the rationals (the reals are problematic for reasons that
are not related to boundedness or unboundedness; not to be discussed
here). I agree with Jeremy that these problems, which are already
present in OWL 1.0, get more relevant when switching from unary to
n-ary datatypes. Still, it seems strange to argue against n-ary
datatypes based on a problem that is present already with unary
datatypes and that, when defining OWL 1.1, we have the chance to fix.

greetings,
 		Carsten

--
*      Carsten Lutz, Institut f"ur Theoretische Informatik, TU Dresden       *
*     Office phone:++49 351 46339171   mailto:lutz@tcs.inf.tu-dresden.de     *

Received on Wednesday, 14 November 2007 19:09:19 UTC