Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum from Rob Shearer on 2008-07-05 (public-webont-comments@w3.org from July 2008)

From: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
Date: Sat, 5 Jul 2008 10:40:47 +0100
To: public-webont-comments@w3.org
Message-Id: <4D463620-93BC-47FF-8876-7F16DBCA0F00@comlab.ox.ac.uk>
>>> Putting aside the issue of whether or not it's possible to use  
>>> (only) the XML Schema datatypes to represent meaningful and  
>>> implementable OWL datatype value spaces, I expect that there is  
>>> consensus that when users were writing `xsd:float` and  
>>> `xsd:double` without values in OWL 1.0, what they really meant was  
>>> "any number".
>
> I don't know what users meant :) I would think that they should use  
> xsd:decimal if that was their intend (or perhaps the new  
> owl:rational/real).

I'm providing you with my experience: every user I've ever spoken to  
about this topic has wanted the real number line.
They are used to using the xsd datatypes `float` and `double` to  
represent number values, so they use these without values in OWL to  
mean "some number".

My experience is that the use of xsd datatypes as value spaces in OWL  
1.0 causes users to write what they don't mean. My experience is that  
*every* ontology using `xsd:float` and `xsd:double` without values  
would be better off using `xsd:decimal`, but that the user intent was  
"some real number" (and I should note that I'm against requiring  
support for `xsd:decimal` values). And my expectation is that users  
would be much less confused if this distinction between the types used  
for specific values and the types used for value spaces were clear.

To repeat: as an implementor, I did willfully implement semantics  
contradictory to the spec, and I will do so again for OWL 2.0 if the  
spec is "broken" in the same way.

> When I am working as a user, I generally, both in programming  
> languages and in kbs, am very careful about computational types and  
> numerical methods. Its easy to find extensive discussions in  
> programming language circles about the pitfalls of floats. All  
> things being equal, it doesn't seem to be that difficult to  
> recommend that they use a more suitable type such as decimal.  
> Indeed, that is what's been happening as more and more programming  
> languages bundled in decimal types.

I am also a very careful programmer, and am familiar the details of  
the IEEE spec.

All the good programmers I've ever worked with are aware of the basic  
problems with floats but almost always use them when they mean "any  
real number" anyway. The mental model I use, and that I encouraged  
among junior programmers, was that floats were "real numbers, but  
assume that they wiggle around a little all the time". Not technically  
correct, but a safe and useful mental model for programming. The point  
being that density of the number line is *not* an issue programmers  
encounter as a matter of course, and one for which their natural  
intuition might well be wrong.

>>> No user ever intended to restrict the semantic space to a nowhere- 
>>> dense number line. If the OWL spec presupposes that most of our  
>>> users would a prefer a number line which does not include 1/3, my  
>>> choice as an implementor would be to once again ignore the spec  
>>> and be intentionally non-compliant.
>
> An alternative choice is to signal that a repair is required and  
> perhaps do it.

I hereby signal that a repair to the OWL spec is required. (Are we  
really pretending that everybody thought datatypes in OWL 1.0 were  
fine and dandy?)

>>> Doing what all my users want and expect in this case turns out to  
>>> be way way easier than doing what a broken spec would require. Any  
>>> working group who would produce such a spec would clearly be  
>>> putting their own interests (ease of spec authoring and political  
>>> considerations) above their duty to their intended users.
>
> I think your rhetoric flew ahead of reality here. It's not actually  
> easier to spec this (as the ongoing battle has shown :)). As you  
> well know, it's much easier to give in to Boris than not to :) I  
> don't believe I'm particularly motivated by political considerations  
> per se. I do think that departing from existing behavior  
> (disjointness) and normal meaning (in computer science) needs to be  
> done carefully.

Let me expand upon my rhetoric:

1. Users want a (dense) real number line.
2. Users expect a (dense) real number line when they write `xsd:float`  
in OWL 1.0 ontologies.
3. OWL 1.0 implementations reason as though the `xsd:float` value  
space is dense.
4. The OWL 1.0 specifications state that the `xsd:float` value space  
is nowhere-dense.

If you disagree about the first two points then it's certainly worth  
discussion: Alan's [investigation](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0103.html 
) seems to support my experience on point 1. I have yet to see a  
single counter-example to point 2---and I've asked many users what  
they meant when they wrote their datatype restrictions.

I admit I haven't done a comprehensive survey on point 3, but it's a  
point of fact and not opinion so we should be able to gather evidence  
one way or the other.

The crux of my rhetoric is that points 1--3 (if you accept them)  
completely and utterly trump point 4. "Existing behavior" is *not*  
what the OWL 1.0 spec says. It's what OWL users (implementors and  
ontology authors) are doing.

> Given that some people have already asked for NaN support (of some  
> form) and that one of the most championed use cases is managing  
> scientific computation results, I don't think we can be too quick to  
> alter things.

I agree that it's an issue, and as a member of the public I don't  
intend to get mightily bogged down in details of the solution to be  
chosen. I'd think that NaN occurs quite rarely, and that semantics  
such as "any real" would suffice, but I don't have strong opinions on  
the issue.

>>> (Note that in the course of the discussion I read on public-owl-wg  
>>> the notions of "dense" and "continuous" seem to have become  
>>> confused. I think the notion of density is probably the only one  
>>> that makes a difference in terms of current OWL semantics, since  
>>> number restrictions can cause inconsistencies in non-dense number  
>>> lines, but continuity is really what users have in their heads.)
>>>
>>> The [XML Schema datatype spec](http://www.w3.org/TR/xmlschema-2/)  
>>> is focused on representing particular values, not on classes of  
>>> values. The notion of "value spaces" is used within the spec, but  
>>> only in service of representation of values
>
> I'm not sure what you mean. It seems clear that the spec is all  
> about classes of values (i.e., types) and their relations.

I mean that the problems that spec is designed to solve involve  
values, not sets of values. The most complex reasoning the XML Schema  
people have in mind is model checking, not satisfiability and  
consistency reasoning. Thus we can't necessarily expect their spec to  
have addressed all the issues which arise in our quite different  
context.

>>> I strongly encourage the working group to publish a spec which  
>>> provides for the following types of semantic spaces:
>>>
>>> 1. A countably infinite, nowhere-dense datatype. I.e. the integers.
>>>
>>> 2. A countably infinite, dense datatype. I.e. strings.
>>>
>>> 3. An uncountably infinite, dense, continuous datatype. I.e. the  
>>> reals.
>
> These are all on the agenda. The first two were in OWL1 and the  
> third is being worked on as part of the n-ary data predicate  
> proposal, but is separate from it (i.e., I believe it will be added  
> regardless of the fate of n-ary).
>
> (Note that this will likely be the algebraic reals and only rational  
> constants. So, no transcendentals. I'd be interested in your view on  
> that. I can imagine adding the trans. but would prefer to defer it  
> until a later iteration.)

This getting ridiculous---so you're saying you think there is a  
substantial user base who need to be able to specify that a value is  
the solution to some algebraic equation? I have absolutely no idea  
what perspective the working group is taking here---what implementor  
or user has expressed interest in anything other than the real number  
line???

Can you guys please just come up with a version of the [`numeric`](http://www.w3.org/TR/xmlschema-2/#rf-numeric 
) notion? Pretty please?

>>> I don't particularly care what each of these three is called; as  
>>> long as OWL specifies the internal semantics of these three types  
>>> of spaces, then it's straightforward to "implement" the datatypes  
>>> users will actually want in terms of them. But, of course, the  
>>> ability to use XML Schema Datatypes to encode specific values  
>>> within each of these spaces would be quite convenient
>
> Do you mean the lexical spaces?

I mean the only time I explicitly want XML Schema is when my  
implementation is parsing specific values provided by the user. If you  
happen to re-use the XML Schema spec for other things that is for your  
own convenience, not mine.

>>> ---and would use the XML Schema specification for *exactly* what  
>>> it's good at.
>
> The additional question is whether to require additional types that  
> are not the above three. Among these are float and double. My belief  
> is that if we are going to add such datatypes as required, and we  
> are going to take them from xsd, then they should reflect the  
> semantics of those types and our advice to users is to only use them  
> if they specifically intend those semantics.

I'd guess that using xsd names for value spaces will just (continue  
to) confuse users.
More importantly, and yet again, I have never ever encountered a user  
who would prefer to use the `float` or `double` value spaces if a  
`real` value space were available. If there are users who feel the  
other way, then please produce them---merely hypothesizing their  
theoretical existence does not seem useful. (I grant that the class is  
satisfiable. I contend that its size is vanishingly small in practice.)

> The n-ary predicate definition system will, at most, be over the  
> core three types above (e.g., polynomial or linear inequations with  
> rational coefficients over the reals ). However, one can pretty  
> easily imagine a predicate definition system that was focused on the  
> floats and was sensitive to the various semantics. It wouldn't have  
> to be direct floating point based equations, but an interval  
> arithmetic system which was designed to help reason about  
> measurements and computations (and their errors).

I care not a whit for n-ary datatypes. I might implement them if  
they're in the spec; I might not. But if the spec says you need to use  
n-ary datatypes to get real numbers, and leaves the issues raised with  
the `float` value space in place, I will ignore the spec and implement  
the real number line for unary datatypes. Just like I did for OWL 1.0.  
As a member of the public, that is my feedback to the working group.

> I grant entirely that that use case is quite speculative at the  
> moment. But given that 1) we have alternatives for the "any number"  
> type and 2) cardinality reasoning with the floats is not very much  
> more difficult that with user defined finite ranges over the  
> integers (except for the fact that users have to do much more work  
> to get there), I don't think we should muck with the semantics of  
> floats.

I strongly disagree with 2. I don't want my implementation to care  
about the difference between `double` and `float`, and I consider any  
line of code I write involving the internals of float representation  
to be a wasted line of code, because my users really don't care.

Much more importantly, it's my job to turn your spec into user-facing  
documentation and support, and there is not a chance in hell I'm going  
to explain this issue to my users. They don't care, and they don't  
want the semantics you are describing. Experience with OWL 1.0 has  
demonstrated this.

> Your feedback and insight are, as always, appreciated. I hope you  
> see that my position doesn't *quite* fall into the error you are  
> rightly concerned with. There's still the problem of educating  
> people about float and double, but that is a problem of long  
> standing :)
>
> I'll also admit up front that I *like* float and double as they are.  
> I think that IEEE binary floating point is a amazingly clever thing.  
> But then, I've always worked in programming languages that had  
> bigints and fractions available, so been spoiled for choice :)

I'm a big fan of balanced ternary. But I don't intend to implement  
that, either.

-rob
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Saturday, 5 July 2008 09:45:55 UTC