Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum from Bijan Parsia on 2008-07-05 (public-owl-wg@w3.org from July 2008)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Sat, 5 Jul 2008 11:18:55 +0100
To: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
Cc: OWL Working Group WG <public-owl-wg@w3.org>
Message-Id: <CE1D641E-560A-49DB-A08A-2B55DB1866F7@cs.man.ac.uk>
On Jul 5, 2008, at 12:08 AM, Rob Shearer wrote:

>>>> Putting aside the issue of whether or not it's possible to use  
>>>> (only) the XML Schema datatypes to represent meaningful and  
>>>> implementable OWL datatype value spaces, I expect that there is  
>>>> consensus that when users were writing `xsd:float` and  
>>>> `xsd:double` without values in OWL 1.0, what they really meant  
>>>> was "any number".
>>
>> I don't know what users meant :) I would think that they should  
>> use xsd:decimal if that was their intend (or perhaps the new  
>> owl:rational/real).
>
> I'm providing you with my experience: every user I've ever spoken  
> to about this topic has wanted the real number line.
> They are used to using the xsd datatypes `float` and `double` to  
> represent number values, so they use these without values in OWL to  
> mean "some number".

Do they mean bounded numbers? (i.e. with min and max sizes?) Do they  
distinguish between double and float? Do they care about NaNs?  
(Alan's users care about the latter.)

> My experience is that the use of xsd datatypes as value spaces in  
> OWL 1.0 causes users to write what they don't mean.

For me, this would suggest removing them or enforcing them more clearly.

> My experience is that *every* ontology using `xsd:float` and  
> `xsd:double` without values would be better off using  
> `xsd:decimal`, but that the user intent was "some real number" (and  
> I should note that I'm against requiring support for `xsd:decimal`  
> values).

Values? Or the datatype? In OWL 1, all these types were optional and  
poorly speced and had no documentation whatsoever. Part of the goal  
here is to spec well and document clearly any types we require.

> And my expectation is that users would be much less confused if  
> this distinction between the types used for specific values and the  
> types used for value spaces were clear.

I don't understand this distinction. Every datatype are used for a  
value space. Some value spaces are finite and some are infinite. Some  
are dense and some are not. etc. So, please clarify what you mean by  
this distinction.

> To repeat: as an implementor, I did willfully implement semantics  
> contradictory to the spec,

The spec made all the types except string and integer optional, and  
didn't say much about them.

It's true that they did defer to the XSD spec, so perhaps you meant  
you violated that? Could you be specific in how you violated it? For  
example:

If an integer typed value appeared as the object of a float ranged  
property, was the KB inconsistent?
If a float typed value was outside the range of xsd:float and used in  
an assertion, was the KB inconsistent?
If a float typed value was NaN (i'm not sure what the constant is  
there), was the KB consistent?
I presume from what you said having a min cardinality which was  
larger than the size of the floats on a float ranged property was  
consistent. But I'm skeptical that this ever occurred in the wild (in  
this way :))
	Did you have a user defined type derived from float that was fairly  
small in range (e.g., -1.0 to 1.0) that interacted with a cardinality  
restriction? Did you support user defined types?

> and I will do so again for OWL 2.0 if the spec is "broken" in the  
> same way.

I think we can take this as read for the moment :)

>> When I am working as a user, I generally, both in programming  
>> languages and in kbs, am very careful about computational types  
>> and numerical methods. Its easy to find extensive discussions in  
>> programming language circles about the pitfalls of floats. All  
>> things being equal, it doesn't seem to be that difficult to  
>> recommend that they use a more suitable type such as decimal.  
>> Indeed, that is what's been happening as more and more programming  
>> languages bundled in decimal types.
>
> I am also a very careful programmer, and am familiar the details of  
> the IEEE spec.
>
> All the good programmers I've ever worked with are aware of the  
> basic problems with floats but almost always use them when they  
> mean "any real number" anyway.

But the systems don't respect that. Ye old Pascal used the name  
"real" for binary floats, but nobody would go for that today. People  
certainly think there are bounds, at least, and they are aware of the  
exactness problems.

> The mental model I use, and that I encouraged among junior  
> programmers, was that floats were "real numbers, but assume that  
> they wiggle around a little all the time". Not technically correct,  
> but a safe and useful mental model for programming.

For some programming. For lots, not (since the error ranges can go  
*really* wide unexpectedly). For lots of things this does matter  
quite a bit. For other things it doesn't.

> The point being that density of the number line is *not* an issue  
> programmers encounter as a matter of course, and one for which  
> their natural intuition might well be wrong.

That's true. Bounds and inexactness are more obvious because of the  
standard rounding action of float operations. But we aren't doing  
rounding here, we are (potentially) doing counting. I agree that  
people's intuitions are bad about counting, but I don't think it's  
that hard to grasp that floats are more like an enumeration of  
integers. It has to be explained of course.

>>>> No user ever intended to restrict the semantic space to a  
>>>> nowhere-dense number line. If the OWL spec presupposes that most  
>>>> of our users would a prefer a number line which does not include  
>>>> 1/3, my choice as an implementor would be to once again ignore  
>>>> the spec and be intentionally non-compliant.
>>
>> An alternative choice is to signal that a repair is required and  
>> perhaps do it.
>
> I hereby signal that a repair to the OWL spec is required.

I meant repair of the ontology :)

> (Are we really pretending that everybody thought datatypes in OWL  
> 1.0 were fine and dandy?)

Not at all. We're trying to do a much better job here. The design  
choice we're faced with is whether to include floats at all (as  
required), and if so, exactly what to spec them as. We'll include an  
owl:real type (I hope) which will *really* be the reals, and perhaps  
require decimal as well. A lot of the time, programmers use floats  
for reals because there is no other choice (or they think computation  
performances is critical). We're in a somewhat different situation.

Clear education material is definitely needed.

>>>> Doing what all my users want and expect in this case turns out  
>>>> to be way way easier than doing what a broken spec would  
>>>> require. Any working group who would produce such a spec would  
>>>> clearly be putting their own interests (ease of spec authoring  
>>>> and political considerations) above their duty to their intended  
>>>> users.
>>
>> I think your rhetoric flew ahead of reality here. It's not  
>> actually easier to spec this (as the ongoing battle has shown :)).  
>> As you well know, it's much easier to give in to Boris than not  
>> to :) I don't believe I'm particularly motivated by political  
>> considerations per se. I do think that departing from existing  
>> behavior (disjointness) and normal meaning (in computer science)  
>> needs to be done carefully.
>
> Let me expand upon my rhetoric:
>
> 1. Users want a (dense) real number line.

Agreed. (But we have decimal and are going to offer real.)

> 2. Users expect a (dense) real number line when they write  
> `xsd:float` in OWL 1.0 ontologies.

Unclear to me. Further, it's unclear to me whether we should respect  
that or work against it.

> 3. OWL 1.0 implementations reason as though the `xsd:float` value  
> space is dense.
> 4. The OWL 1.0 specifications state that the `xsd:float` value  
> space is nowhere-dense.

By reference to xsd, yes.

> If you disagree about the first two points then it's certainly  
> worth discussion: Alan's [investigation](http://lists.w3.org/ 
> Archives/Public/public-owl-wg/2008Jul/0103.html) seems to support  
> my experience on point 1.

See above.

> I have yet to see a single counter-example to point 2---and I've  
> asked many users what they meant when they wrote their datatype  
> restrictions.

Let me stipulate that for a minute. I trust you would contend this  
for programming languages too. But programming languages don't  
silently substitute a dense value space for floats. I would contend  
that most people expect exact computations from their reals too, but  
again, programming languages don't do that (though, amazingly to me,  
calculator do! I did some testing last year and many even simple  
online calulators use reals internally so you can go from 1/3 * 3 and  
get 1 again.)

There are lots of features of OWL that aren't obvious to many users  
(the open world assumption, the unique name assumption). This isn't a  
reason to blithely ignore user instincts, of course. Far from it! But  
it is not immediate.

> I admit I haven't done a comprehensive survey on point 3, but it's  
> a point of fact and not opinion so we should be able to gather  
> evidence one way or the other.
>
> The crux of my rhetoric is that points 1--3 (if you accept them)  
> completely and utterly trump point 4. "Existing behavior" is *not*  
> what the OWL 1.0 spec says. It's what OWL users (implementors and  
> ontology authors) are doing.

I agree with that. Always have. I set the bar for breaking *deployed*  
behavior as far far higher than merely *specced* behavior. On the  
other hand, I'm not utterly against breaking backward compatibility.  
If we make xsd:float and xsd:double aliases for each other and for  
xsd:decimal, that's a pretty big design choice and renders them  
useless for representing what they actually mean (by XSD). To handle  
other users we have to pop NaN and the zeros and infinities in as  
well (SOMEwhere) and that causes other problems.

[snip]
>> I'm not sure what you mean. It seems clear that the spec is all  
>> about classes of values (i.e., types) and their relations.
>
> I mean that the problems that spec is designed to solve involve  
> values, not sets of values. The most complex reasoning the XML  
> Schema people have in mind is model checking, not satisfiability  
> and consistency reasoning.

Ah, I see. Thanks!

> Thus we can't necessarily expect their spec to have addressed all  
> the issues which arise in our quite different context.

I agree. But this seems mostly to arise by their not handling  
rationals and reals and in the set of "strange" datatypes (dates,  
qnames, etc.).

[snip]
>> (Note that this will likely be the algebraic reals and only  
>> rational constants. So, no transcendentals. I'd be interested in  
>> your view on that. I can imagine adding the trans. but would  
>> prefer to defer it until a later iteration.)
>
> This getting ridiculous---so you're saying you think there is a  
> substantial user base who need to be able to specify that a value  
> is the solution to some algebraic equation?

Yes. Remember that we are (considering) adding linear and perhaps  
polynomial inequations. This has been requested by users (including  
those working on commerical projects, see:
	http://www.w3.org/2007/OWL/wiki/N-ary_Data_predicate_use_case
for some examples)

If we rule out the irrationals, then we have intuitively solvable  
equations which are not solvable in the rationals.

For writing down values, it seems that the rationals are enough for a  
wide range of cases where you are looking for a dense line. Until you  
expand the range of constants (usually with equations of some form)  
it's hard to see the utility to users of additional reals (e.g., the  
square root of two isn't really a constant, but a radical).

> I have absolutely no idea what perspective the working group is  
> taking here---what implementor

RacerPro already has some form of linear inequations over the  
algebraic reals (and, I believe, the algebraic complex numbers). FaCT+ 
+ and Pellet developers have indicated that they have interest in  
implementation. We have several classes of users (some of which are  
represented on the page above.)

The working group does not yet have consensus on these features. It  
also does not have a complete design.

> or user has expressed interest in anything other than the real  
> number line???

The algebraic reals just are the above (i.e., the solutions to  
polynomials with rational coefficients). The transcendental reals are  
still more. Without transcendental coefficents or constants, you  
can't "get" to them. So, practically speaking, it makes no difference  
to the consistency of any knowledge base whether we spec the type as  
being all reals or being algebraic reals.

I personally would like to separate them to make augmenting the  
lexical space a bit easier in the future. That is, if we're not going  
to have transcendental constants, I think it makes sense to *call*  
the type we're introducing something like "algebraic reals" (and even  
restrict the value space). That leaves it open for a future group to  
introduce the reals as a super type (both in the lexical and in the  
value spaces).

> Can you guys please just come up with a version of the [`numeric`] 
> (http://www.w3.org/TR/xmlschema-2/#rf-numeric) notion? Pretty please?

That's interesting and potentially helpful for some things (like  
spliting off the strings from the numbers in general), but I don't  
see it helps with our current situation.

>>>> I don't particularly care what each of these three is called; as  
>>>> long as OWL specifies the internal semantics of these three  
>>>> types of spaces, then it's straightforward to "implement" the  
>>>> datatypes users will actually want in terms of them. But, of  
>>>> course, the ability to use XML Schema Datatypes to encode  
>>>> specific values within each of these spaces would be quite  
>>>> convenient
>>
>> Do you mean the lexical spaces?
>
> I mean the only time I explicitly want XML Schema is when my  
> implementation is parsing specific values provided by the user. If  
> you happen to re-use the XML Schema spec for other things that is  
> for your own convenience, not mine.

So what is your position on user defined types? We reuse that from  
XSD too.

The core types, and even things like float, seem coherent from an OWL  
point of view (excepting certain facets and corner issues). That is,  
you can coherently reason over them.

>>>> ---and would use the XML Schema specification for *exactly* what  
>>>> it's good at.
>>
>> The additional question is whether to require additional types  
>> that are not the above three. Among these are float and double. My  
>> belief is that if we are going to add such datatypes as required,  
>> and we are going to take them from xsd, then they should reflect  
>> the semantics of those types and our advice to users is to only  
>> use them if they specifically intend those semantics.
>
> I'd guess that using xsd names for value spaces will just (continue  
> to) confuse users.

Seems so.

> More importantly, and yet again, I have never ever encountered a  
> user who would prefer to use the `float` or `double` value spaces  
> if a `real` value space were available.

But that suggests (to me) we not provide the float or double types.

> If there are users who feel the other way, then please produce  
> them---merely hypothesizing their theoretical existence does not  
> seem useful. (I grant that the class is satisfiable. I contend that  
> its size is vanishingly small in practice.)

Again, there are many aspects of the types, e.g., disjointness, size,  
NaN, lexical space, and discreteness. As far as I can tell, users  
have picked on several of these (while champions have claimed that  
some parts that other people have dismissed are critical).

That all being said, my personal concern with "getting floats right"  
involve future, hypothetical use. Which is a big fucking weakness of  
my position. But I would prefer not to require them at all than to  
require them with "wrong" semantics. I would prefer directing people  
who want a real type to the real type. I think that is generally  
better for a number of reasons, including education. It's rather odd  
to introduce primitive types with different names with no different  
semantics not even *intended* different semantics.

Oh, of course, one reason is if the lexical spaces are different. I  
don't have a problem giving the lexical space of our real a lot of  
lexical freedom (in the initial proposal we suggested a fraction like  
syntax, but we could add all sorts of variants; but you have to be  
careful because other syntaxes sometimes require infinite expansions).

>> The n-ary predicate definition system will, at most, be over the  
>> core three types above (e.g., polynomial or linear inequations  
>> with rational coefficients over the reals ). However, one can  
>> pretty easily imagine a predicate definition system that was  
>> focused on the floats and was sensitive to the various semantics.  
>> It wouldn't have to be direct floating point based equations, but  
>> an interval arithmetic system which was designed to help reason  
>> about measurements and computations (and their errors).
>
> I care not a whit for n-ary datatypes. I might implement them if  
> they're in the spec; I might not. But if the spec says you need to  
> use n-ary datatypes to get real numbers,

No no no. The real datatype will be available, simpliciter. It's use  
in n-ary is an *additional* motivation for it.

> and leaves the issues raised with the `float` value space in place,

?

> I will ignore the spec and implement the real number line for unary  
> datatypes.

With transcendentals? With what lexical space?

> Just like I did for OWL 1.0. As a member of the public, that is my  
> feedback to the working group.

Thanks! Please not that all this is not settled yet.


>> I grant entirely that that use case is quite speculative at the  
>> moment. But given that 1) we have alternatives for the "any  
>> number" type and 2) cardinality reasoning with the floats is not  
>> very much more difficult that with user defined finite ranges over  
>> the integers (except for the fact that users have to do much more  
>> work to get there), I don't think we should muck with the  
>> semantics of floats.
>
> I strongly disagree with 2.

Really? <http://www.cygnus-software.com/papers/comparingfloats/ 
comparingfloats.htm>

There's some special casing for various "odd" bits (NaN, etc.) but  
this shows that sizing float ranges can be reduced to sizing integer  
ranges. Thus, it's not fundamentally different.

> I don't want my implementation to care about the difference between  
> `double` and `float`,

So you want them exactly identical?

> and I consider any line of code I write involving the internals of  
> float representation to be a wasted line of code, because my users  
> really don't care.
>
> Much more importantly, it's my job to turn your spec into user- 
> facing documentation

This is my job too :) Both inside and outside the working group.

> and support, and there is not a chance in hell I'm going to explain  
> this issue to my users. They don't care, and they don't want the  
> semantics you are describing. Experience with OWL 1.0 has  
> demonstrated this.
[snip]

Can you say exactly what the semantics are you want? I get that you  
want them dense (and think that I'm dense :)). But I'm unclear on:
	disjointness (from each other and from decimal and its subtypes like  
integer)
	range
	NaN like constants

Thanks for the feedback.

Cheers,
Bijan.
Received on Saturday, 5 July 2008 10:19:40 UTC