Re: XML Schema datatypes from Jeremy Carroll on 2007-11-15 (public-owl-wg@w3.org from November 2007)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Thu, 15 Nov 2007 12:28:39 +0000
To: Carsten Lutz <clu@tcs.inf.tu-dresden.de>
CC: public-owl-wg@w3.org
Message-ID: <473C3B77.5080508@hpl.hp.com>
Hi Carsten

I largely agree with your argument, but perhaps not with your conclusions.

 > I
 > think what your example shows is that using bounded or fixed-precision
 > datatypes such as float or decimal in the semantics of an ontology
 > language is a bad idea.

But what about in a *Web* ontology language?

The reason we have XML Schema Datatypes in OWL, is because we have them 
in RDF.
We have them in RDF because the RDF Core WG decided that the W3C had 
already put a substantial amount of effort into agreeing some consensus 
datatypes for use on the Web, and they should reuse that.

I think those arguments are good ones; and the W in OWL stands for Web: 
OWL needs to have reasonable degrees of interoperability with other 
stuff on the Web - and I fear that what would come across as a pedantic 
attachment to mathematical abstractions such as real numbers, and a 
spitting in the face of many decades of engineering realities in 
computing such as IEEE 64 bit floating point numbers, would be unhelpful.

===

It might be possible to redesign OWL (and RDF) datatyping so that there 
is a surface of XML Schema datatypes over a semantic 'truth' which has 
better mathematical properties.

Such a task would be outside our charter.


Jeremy



Carsten Lutz wrote:
> 
> On Tue, 13 Nov 2007, Jeremy Carroll wrote:
>>
>> 7) rounding errors behave very differently from in traditional numeric
>> applications - hence a solved problem (rounding) becomes an unsolved 
>> problem
>>
>> (I will also add another area of concern which is a mismatch between 
>> the real numbers and the XSD datatypes)
>>
>> ===
> 
> This is an interesting subject. I don't think it is really about
> "rounding", though, and neither is it really about n-ary datatypes.  I
> think what your example shows is that using bounded or fixed-precision
> datatypes such as float or decimal in the semantics of an ontology
> language is a bad idea. This means that already the unary datatypes in
> OWL 1.0 are in some sense broken, and the XML Schema datatypes are
> probably *not* the right thing to use in OWL. Sorry if this sounds like
> blasphemy.
> 
> Let me explain in more detail what I mean, singling out a number of
> subjects.
> 
> ----------------------------------------------------------------------
> 
> 1. Not about rounding
> 
> Take your example
> 
>> An example from [Pan and Horrocks] declares that the Yangtze river is 
>> 3937.5 miles long and uses the kmtrsPerMile predicate to deduce that 
>> it is also 6300.0km long. In other words,
>> ( 6300.0, 3937.5 )  in [kmtrsPerMile].
>>
>> This example uses the XML Schema datatype float to represent lengths. 
>> Suppose that the Yangtze was declared instead to be 3937.501 miles 
>> long, then
>>
>> ( 6300.0015, 3937.501 )  in [kmtrsPerMile]
>>
>> so the Yangtze river may be deduced to be 6300.0015km long. However,
>>
>> ( 6300.0015, 3937.5007 ) in  [kmtrsPerMile]
>>
>> so that the Yangtze river may also be deduced to be 3937.5007 miles 
>> long. This would be inconsistent with the user?s expectation that a 
>> river has only one length.
> 
> I think what happens here is indeed bad, but it is something different
> from what is claimed. Namely, as far as I can see, rounding is not
> explicitly addressed in the semantics of OWL1.0 and in XML Schema, and
> neither in the proposed semantics of OWL 1.1 (correct me if I'm
> wrong). Thus, rounding does *not take place*. Instead, there is a
> "gap" in the datatype. If reasoners do rounding, they actually violate
> the semantics.
> 
> Let's make your example a bit more precise. Assume Yangtze is an
> individual and
> 
> a I use a unary datatype predicate "=3937.501" on the datatype
>   property "lengthInMiles", i.e., is say that Yangtze is connected
>   to the concrete object 3937.501 via "lenthinMiles"
> 
> b Now I use a binary predicate datatypre predicte "milesToKm" on
>   Yantze, with first argument "lengthInMiles" and second argument
>   "lengthInKm". I do this using a dataPropertyAssertion, which has
>   an existential semantics.
> 
> Thus, b stipulates that there *is* a float that corresponds to 3937.501
> in kms, but in fact there isn't (be cause we would need to do rounding
> to make it a legal float, which we don't). What do we get? An
> inconsistency. This is different from, but not much better than, two
> different values for the length.
> 
> ----------------------------------------------------------------------
> 
> 2. Not about n-ary datatypes
> 
> The point of your example above is that the fixed precision of float
> produces unexpected results, namely an inconsistency. This already
> happens with unary datatypes, and even with those that are definable
> in XML Schema.
> 
> Take e.g. float. Let n be the largest float that is smaller than 1.
> Such a float exists since there are only finitely many floats. Now
> assume that a user defines two unary predices, one that is true for
> all floats strictly greater than n, and one for all floats strictly
> smaller than 1. Asserting the existence of a data value in the
> intersection of these two predicates leads to an inconsistency.
> 
> But this is not what a user excepts since (s)he is working with an
> ontology, i.e., doing a *conceptual modelling*, so she should abstract
> away from details such as representation of numbers and think of floats
> as rationals or reals. These are dense, so the user expects the above
> intersection to be non-emoty.
> 
> So, we get unexpected results already with unary datatypes. The issue
> here is not the arity. It is the boundedness / fixed precision of XML
> Schema datatypes.
> 
> ----------------------------------------------------------------------
> 
> 3. XML Schema is not a good choice for defining datatypes.
> 
> XML Schema is a schema language for XML, i.e., it describes
> semi-structured data stored in the form of an XML document.
> It's very good for that purpose, because if you store data,
> it is important to consider the details of storage, which
> usually involves boundedness, fixed precision, and rounding.
> 
> *We* are *not* defining a schema language for (stored) data in the
> sense of XML Schema. So it is a valid question whether or not the XML
> Schema datatypes are also good for our (different) purposes. I believe
> they are not.
> 
> We are defining an ontology language with a declarative semantics. As
> the above examples show (and there are tons more), we get all sorts of
> oddities from the combination of (i) an expressive logic that has a
> declarative semantics and (ii) the bounded datatypes of XML schema. As
> the literature on concrete domains in description logics shows, there
> are no such problems if you work with unbounded datatypes such as the
> integers or the rationals (the reals are problematic for reasons that
> are not related to boundedness or unboundedness; not to be discussed
> here). I agree with Jeremy that these problems, which are already
> present in OWL 1.0, get more relevant when switching from unary to
> n-ary datatypes. Still, it seems strange to argue against n-ary
> datatypes based on a problem that is present already with unary
> datatypes and that, when defining OWL 1.1, we have the chance to fix.
> 
> greetings,
>         Carsten
> 
> -- 
> *      Carsten Lutz, Institut f"ur Theoretische Informatik, TU 
> Dresden       *
> *     Office phone:++49 351 46339171   
> mailto:lutz@tcs.inf.tu-dresden.de     *
>
Received on Thursday, 15 November 2007 12:29:11 UTC