RE: ISSUE-126 (Revisit Datatypes): A proposal for resolution from Boris Motik on 2008-07-01 (public-owl-wg@w3.org from July 2008)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Tue, 1 Jul 2008 18:32:26 +0100
To: "'Michael Smith'" <msmith@clarkparsia.com>
Cc: "'OWL Working Group WG'" <public-owl-wg@w3.org>
Message-ID: <005101c8dba0$6dca3d80$7212a8c0@wolf>
Hello,

> -----Original Message-----
> From: Michael Smith [mailto:msmith@clarkparsia.com]
> Sent: 01 July 2008 17:45
> To: Boris Motik
> Cc: 'OWL Working Group WG'
> Subject: RE: ISSUE-126 (Revisit Datatypes): A proposal for resolution
> 
> On Tue, 2008-07-01 at 15:17 +0100, Boris Motik wrote:
> 
> > Given these definitions, the value spaces of all these datatypes are
> > just numbers, not pairs of the form (number,type). Therefore,
> > if we base the datatype system of OWL 2 on XML Schema, we have no
> > other choice but to say that the value spaces are overlapping.
> 
> ?
> > - For xsd:float: The basic value space of double consists of the
> > values m x 2^e, where m is an integer whose absolute value is less
> > than 2^53, and e is an integer between -1075 and 970, inclusive.
> 
> You omitted the following, which complicates your interpretation.
> 
>         In addition to the basic .value space. described above, the
>         .value space. of float also contains the following three special
>         values: positive and negative infinity and not-a-number (NaN).
> 
> xsd:float and xsd:decimal overlap, but xsd:decimal does not contain all
> of xsd:float.
> 
> 

I was aware of that, and this is precisely why in my proposal I suggested that we should modify xsd:float to make it just a range of
numbers. Thus, "our" xsd:float would not be the same as the float from the XML Schema.

Regardless of whether we have +inf, -inf, and NaN or not, I think it is still misleading for "1"^^xsd:float not to be equal to
"1"^^xsd:integer. If we absolutely need +inf, -inf, and NaN, then I'd say we need to add them to owl:real, and then make all other
numeric datatypes subsets of that datatype.

Finally, do we really care about +inf, -inf, and NaN? XML Schema might care, but again, XML Schema is a schema and not an ontology
language. XML Schema does not need to do any reasoning on the datatypes; it only needs to perform straightforward validation. This
is why I suggested to change xsd:float: we probably don't want to reason about the properties of floating point arithmetic. We can
keep the name to make people happy. People will be able to put values into their ontology that can be written in the form of floats
and will be perfectly happy with that; for the most part, they won't be able to detect the difference.

> > On the practical side, I can't see how overlapping value spaces would be more difficult to
> implement than if they were not
> > overlapping. In my ISWC paper I've presented an algorithm that deals with this issue, and I really
> can't see a possible source of
> > implementation difficulty.
> 
> It might not be more difficult, and I don't think I've argued that nor
> do I think anyone else has.
> 
> I have argued that implementations already exist based on the
> disjointness of *primitive* datatypes and that such implementations
> would be broken.  I have argued that users were given one set of advice
> and that reversing that advice has some cost.  Further, re-using
> xsd:float and xsd:double, but only with a subset of the values from XML
> Schema seems inappropriate.
> 

I'm not convinced by this argument. True, these implementations would be broken, but this is partly because OWL 1 was never precise
on this point. Furthermore, I strongly suspect these implementations to be broken as they are: I doubt that they implement the
semantics of xsd:float in the correct way. Here is an example. Assume you have the following ontology, where n1 and n2 are constants
that I'll specify later:

(1) PropertyRange( a:prop
        DatatypeRestriction( xsd:float
            minExclusive "n1"^^xsd:float
            maxExclusive "n2"^^xsd:float
        )
    )
(2) n1 is a constant that corresponds to the number 1 * 2^-1075
(3) n2 is a constant that corresponds to the number 3 * 2^-1075

If I'm not mistaken, the range of a:prop now contains exactly one floating point number: 2 * 2^-1075. Consider now if you had these
axioms in addition:

(4) ClassAssertion( SomeValuesFrom( a:prop rdfs:literal) a:i1 )
(5) ClassAssertion( SomeValuesFrom( a:prop rdfs:literal) a:i2 )
(6) KeyFor( a:prop owl:Thing )

Since the extension of the data range is a singleton, it is not possible for a:i1 and a:i2 to have different values for a:prop;
therefore, this ontology should actually entail SameIndividual( a:i1 a:i2 ).

Well, I really believe that this is difficult to implement! You now need to analyze the bit-representation of n1 and n2 and count
the number of floats between them. I am willing to bet a large sum of money that no existing tool would handle this test case
correctly; thus, I suspect that the tools are incorrect w.r.t. the formal semantics of floats.

Furthermore, it is clear that this example is contrived: I cannot imagine that one would really need to make this inference. It is
true that one can test for corner cases first and thus optimize reasoning; however, at one point I need to write some code that
handles this test case (or I have a broken implementation). Hence my suggestion: let us say that xsd:float is dense, and thus drop
any pretense that we are having a discrete interpretation of floats. In such a case, I would know that between n1 and n2 there are
infinitely many numbers, so I could trivially say that (1)--(6) would not entail SameIndividual( a:i1 a:i2 ).

> 
> Finally, there is only one KR use case I've heard motivating the
> inclusion of xsd:float and xsd:double, the ability to map data from an
> ontology to a corresponding and widely implemented machine
> representation.  This change would prevent that translation from being
> accurate and would seem to undermine the argument for including the
> machine datatypes at all.
> 

I don't think my proposal prevents an implementation to use efficiently representable datatypes at all. Perhaps I wasn't totally
precise in my proposal, so let me restate things.

- We get rid of -inf, +inf, and NaN. I don't think many people will weep.

- We say that xsd:float is the set of *real* numbers between -2^53 * 2^-1075 and 2^53 * 2^970.

- We say that the constants of xsd:float are exactly as in XML Schema, apart from -inf, +inf, and NaN. Thus, for each number of the
form m*2^e where m is an integer whose absolute value is less than 2^53 and -1075 <= e <= 970, we have at least one constant that
corresponds to this number. Thus, if your implementation supports such numbers (and most do), there is nothing that prevents your
implementation from mapping each constant into such internal representation.

It is true that, in doing so, you might loose some "formatting" of the constant (e.g., the difference between 1 and 1.0), but, as I
said in my previous e-mails, I don't think anyone will really care about this and the spec doesn't preclude this. The point is,
however, that

(a) you are using exactly the space that is required to represent a float, and
(b) you can represent all constants of xsd:float apart from +inf, +inf, and NaN.


Thus, I really don't think that my proposal has all that many bad features. It simply prevents implementors from having to deal with
quite nasty corner cases, such as (1)--(6); everything else should work as usual.



Finally, note also that the question or arithmetic with xsd:float is much more involved and is somewhat orthogonal to my proposal.
The problem arises due to the fact that, for example, 1/3 is not representable in xsd:float. Now if you want to implement OWL
correctly, you'd need to find a way to deal with this. Making xsd:float discrete simply makes matters worse: rounding errors are
really difficult to represent in logic. In any case, I think this issue should probably be discussed separately.

Regards,

	Boris 

> --
> Mike Smith
> 
> Clark & Parsia
Received on Tuesday, 1 July 2008 17:34:06 UTC