Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum from Bijan Parsia on 2008-07-04 (public-owl-wg@w3.org from July 2008)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Fri, 4 Jul 2008 22:47:02 +0100
To: OWL Working Group WG <public-owl-wg@w3.org>
Cc: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
Message-Id: <6314D765-2CE2-4326-8430-AF83CF2D593C@cs.man.ac.uk>
Trimmed ccs.

> On Jul 4, 2008, at 12:46 PM, Rob Shearer wrote:
>
>> This message is in regard to the discussion related to [this] 
>> (http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/ 
>> 0101.html).
>>
>> When I was implementing the Cerebra OWL reasoner, I came to the  
>> firm conclusion that the OWL (1.0) spec was downright broken on  
>> this point, and I fear we're in danger of breaking OWL 2.0 in  
>> exactly the same way.
>>
>> Putting aside the issue of whether or not it's possible to use  
>> (only) the XML Schema datatypes to represent meaningful and  
>> implementable OWL datatype value spaces, I expect that there is  
>> consensus that when users were writing `xsd:float` and  
>> `xsd:double` without values in OWL 1.0, what they really meant was  
>> "any number".

I don't know what users meant :) I would think that they should use  
xsd:decimal if that was their intend (or perhaps the new owl:rational/ 
real).

When I am working as a user, I generally, both in programming  
languages and in kbs, am very careful about computational types and  
numerical methods. Its easy to find extensive discussions in  
programming language circles about the pitfalls of floats. All things  
being equal, it doesn't seem to be that difficult to recommend that  
they use a more suitable type such as decimal. Indeed, that is what's  
been happening as more and more programming languages bundled in  
decimal types.

>> No user ever intended to restrict the semantic space to a nowhere- 
>> dense number line. If the OWL spec presupposes that most of our  
>> users would a prefer a number line which does not include 1/3, my  
>> choice as an implementor would be to once again ignore the spec  
>> and be intentionally non-compliant.

An alternative choice is to signal that a repair is required and  
perhaps do it.

>> Doing what all my users want and expect in this case turns out to  
>> be way way easier than doing what a broken spec would require. Any  
>> working group who would produce such a spec would clearly be  
>> putting their own interests (ease of spec authoring and political  
>> considerations) above their duty to their intended users.

I think your rhetoric flew ahead of reality here. It's not actually  
easier to spec this (as the ongoing battle has shown :)). As you well  
know, it's much easier to give in to Boris than not to :) I don't  
believe I'm particularly motivated by political considerations per  
se. I do think that departing from existing behavior (disjointness)  
and normal meaning (in computer science) needs to be done carefully.  
Given that some people have already asked for NaN support (of some  
form) and that one of the most championed use cases is managing  
scientific computation results, I don't think we can be too quick to  
alter things.

>> (Note that in the course of the discussion I read on public-owl-wg  
>> the notions of "dense" and "continuous" seem to have become  
>> confused. I think the notion of density is probably the only one  
>> that makes a difference in terms of current OWL semantics, since  
>> number restrictions can cause inconsistencies in non-dense number  
>> lines, but continuity is really what users have in their heads.)
>>
>> The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/)  
>> is focused on representing particular values, not on classes of  
>> values. The notion of "value spaces" is used within the spec, but  
>> only in service of representation of values

I'm not sure what you mean. It seems clear that the spec is all about  
classes of values (i.e., types) and their relations.

>> ---note that there's not a single value space mentioned which is  
>> continuous with respect to the reals, nor are such notions as  
>> "rationals" defined. This makes sense in terms of data  
>> serialization (the driving XML use case) and standard programming  
>> languages (where manipulation of values is the driving use case),  
>> but OWL is in a very different situation. The primary OWL use case  
>> is reasoning about the emptiness (or size) of value spaces, and  
>> the definitions provided in the XML Schema spec do not serve this  
>> purpose well.

Adding a proper real type is on the agenda.

>> Note that I'm not saying XML Schema is a bad spec; merely that it  
>> addresses different problems than we have.

This is true :)

>> I strongly encourage the working group to publish a spec which  
>> provides for the following types of semantic spaces:
>>
>> 1. A countably infinite, nowhere-dense datatype. I.e. the integers.
>>
>> 2. A countably infinite, dense datatype. I.e. strings.
>>
>> 3. An uncountably infinite, dense, continuous datatype. I.e. the  
>> reals.

These are all on the agenda. The first two were in OWL1 and the third  
is being worked on as part of the n-ary data predicate proposal, but  
is separate from it (i.e., I believe it will be added regardless of  
the fate of n-ary).

(Note that this will likely be the algebraic reals and only rational  
constants. So, no transcendentals. I'd be interested in your view on  
that. I can imagine adding the trans. but would prefer to defer it  
until a later iteration.)

>> I don't particularly care what each of these three is called; as  
>> long as OWL specifies the internal semantics of these three types  
>> of spaces, then it's straightforward to "implement" the datatypes  
>> users will actually want in terms of them. But, of course, the  
>> ability to use XML Schema Datatypes to encode specific values  
>> within each of these spaces would be quite convenient

Do you mean the lexical spaces?

>> ---and would use the XML Schema specification for *exactly* what  
>> it's good at.

The additional question is whether to require additional types that  
are not the above three. Among these are float and double. My belief  
is that if we are going to add such datatypes as required, and we are  
going to take them from xsd, then they should reflect the semantics  
of those types and our advice to users is to only use them if they  
specifically intend those semantics.

The n-ary predicate definition system will, at most, be over the core  
three types above (e.g., polynomial or linear inequations with  
rational coefficients over the reals ). However, one can pretty  
easily imagine a predicate definition system that was focused on the  
floats and was sensitive to the various semantics. It wouldn't have  
to be direct floating point based equations, but an interval  
arithmetic system which was designed to help reason about  
measurements and computations (and their errors).

I grant entirely that that use case is quite speculative at the  
moment. But given that 1) we have alternatives for the "any number"  
type and 2) cardinality reasoning with the floats is not very much  
more difficult that with user defined finite ranges over the integers  
(except for the fact that users have to do much more work to get  
there), I don't think we should muck with the semantics of floats.

Your feedback and insight are, as always, appreciated. I hope you see  
that my position doesn't *quite* fall into the error you are rightly  
concerned with. There's still the problem of educating people about  
float and double, but that is a problem of long standing :)

I'll also admit up front that I *like* float and double as they are.  
I think that IEEE binary floating point is a amazingly clever thing.  
But then, I've always worked in programming languages that had  
bigints and fractions available, so been spoiled for choice :)

Cheers,
Bijan.
Received on Friday, 4 July 2008 21:47:45 UTC