Re: ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum from Rob Shearer on 2008-07-05 (public-webont-comments@w3.org from July 2008)

From: Rob Shearer <rob.shearer@comlab.ox.ac.uk>
Date: Sat, 5 Jul 2008 13:04:57 +0100
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: OWL Working Group WG <public-owl-wg@w3.org>, public-webont-comments@w3.org
Message-Id: <723EFC6A-9084-4D8A-B261-B385524CE033@comlab.ox.ac.uk>
>> I'm providing you with my experience: every user I've ever spoken  
>> to about this topic has wanted the real number line.
>> They are used to using the xsd datatypes `float` and `double` to  
>> represent number values, so they use these without values in OWL to  
>> mean "some number".
>
> Do they mean bounded numbers? (i.e. with min and max sizes?) Do they  
> distinguish between double and float? Do they care about NaNs?  
> (Alan's users care about the latter.)

Whether it's "forall R > 1.0^^xsd:float" or "forall R `xsd:float`"  
they seem to intend a dense number line. In the first case `float` is  
just the easiest way to specify the value; in the second you can  
certainly argue that they should have used `decimal`...but that's a  
pointless argument because my reasoner didn't really support decimal.

>> My experience is that the use of xsd datatypes as value spaces in  
>> OWL 1.0 causes users to write what they don't mean.
>
> For me, this would suggest removing them or enforcing them more  
> clearly.

I'd suggest removing them.

>> My experience is that *every* ontology using `xsd:float` and  
>> `xsd:double` without values would be better off using  
>> `xsd:decimal`, but that the user intent was "some real number" (and  
>> I should note that I'm against requiring support for `xsd:decimal`  
>> values).
>
> Values? Or the datatype? In OWL 1, all these types were optional and  
> poorly speced and had no documentation whatsoever. Part of the goal  
> here is to spec well and document clearly any types we require.

I would like to use doubles internally to represent points on the real  
number line. Some homogeneous mix of internal representations is a  
pain. And I seriously doubt that many users really care about the  
extra representation power of `decimal`. It makes sense as an optional  
feature reasoners can support, but it seems completely unnecessary to  
require it in the spec---it's exactly the sort of thing I'd put off  
implementing indefinitely under users asked for it.

The reason `decimal` keeps coming up is just that it's dense. So are  
we using the xsd spec as an excuse to conflate density with complex  
internal representations?

>> And my expectation is that users would be much less confused if  
>> this distinction between the types used for specific values and the  
>> types used for value spaces were clear.
>
> I don't understand this distinction. Every datatype are used for a  
> value space. Some value spaces are finite and some are infinite.  
> Some are dense and some are not. etc. So, please clarify what you  
> mean by this distinction.

A particular value is a point on the number line. XSD offers plenty of  
different lexical representations for such points. I support using XSD  
types to specify such points.

A "value space" (for numerics, at least) is a (possibly infinite) set  
of points on the number line. Although each xsd type is associated  
with some value space, I think xsd is a really really crap spec for  
value spaces. The entire spec is oriented in terms of lexical  
representation, not value spaces: the type hierarchy, for example, is  
a hierarchy of lexical representations, not a hierarchy of value  
spaces. Referring any user over to that spec to understand value  
spaces is obnoxious and counter-productive: even WG members seem to be  
having trouble grokking it. (And bravo to anyone making the pedantic  
point that a particular value is a degenerate value space.)

I contend that OWL users only want a tiny tiny number of different  
value spaces to play with: integers, strings, and reals.
It is possible, however, they they will want a larger number of ways  
to lexically represent particular values within these three spaces.
Most importantly, I do not think there is necessarily a direct  
correlation between the lexical representations used to represent  
particular values and the value spaces in which those particular  
values live. I.e. users want to be able to specify particular values  
within the `real` value space using `xsd:float`, but they do *not*  
have any interest in use of the `xsd:float` value space.

Thus we've got two orthogonal concepts which happen to coincide for  
strings and integers but not for real numbers.

My proposed solution would be to use brand-new OWL names for all value  
spaces, but use xsd syntax to specify particular values.

>> To repeat: as an implementor, I did willfully implement semantics  
>> contradictory to the spec,
>
> The spec made all the types except string and integer optional, and  
> didn't say much about them.
>
> It's true that they did defer to the XSD spec, so perhaps you meant  
> you violated that? Could you be specific in how you violated it? For  
> example:
>
> If an integer typed value appeared as the object of a float ranged  
> property, was the KB inconsistent?

No---I made the integer number line a subset of the real number line.

> If a float typed value was outside the range of xsd:float and used  
> in an assertion, was the KB inconsistent?

For the restriction "forall R `xsd:float`" I simply bounded the real  
number line at the min and max values of floats. Still a dense,  
infinite number line, but with bounds. I hated this usage, however,  
and would prefer if it became illegal.

> If a float typed value was NaN (i'm not sure what the constant is  
> there), was the KB consistent?

I don't entirely recall; I think NaN become either "any real" or "no  
real". Despite asking around a bit, I couldn't find any users who had  
strong opinions on the topic, however.

> I presume from what you said having a min cardinality which was  
> larger than the size of the floats on a float ranged property was  
> consistent. But I'm skeptical that this ever occurred in the wild  
> (in this way :))

Other than a little internal test in our suite to remind me of the  
issue I also doubt that my use of the real number line instead of the  
float number line ever arose.

> 	Did you have a user defined type derived from float that was fairly  
> small in range (e.g., -1.0 to 1.0) that interacted with a  
> cardinality restriction? Did you support user defined types?

We supported ranges, but I doubt any users wrote a range so small (and  
a cardinality restriction so big) that the issue ever arose. The hard  
data is that I never got any bug reports on the topic; the soft data  
is that when I interrogated users about their intent they were  
surprised by the notion that the set of floats was finite. (They  
understood this in principle but had no intention of deriving  
inconsistency as a result.)

>> and I will do so again for OWL 2.0 if the spec is "broken" in the  
>> same way.
>
> I think we can take this as read for the moment :)
>
>>> When I am working as a user, I generally, both in programming  
>>> languages and in kbs, am very careful about computational types  
>>> and numerical methods. Its easy to find extensive discussions in  
>>> programming language circles about the pitfalls of floats. All  
>>> things being equal, it doesn't seem to be that difficult to  
>>> recommend that they use a more suitable type such as decimal.  
>>> Indeed, that is what's been happening as more and more programming  
>>> languages bundled in decimal types.
>>
>> I am also a very careful programmer, and am familiar the details of  
>> the IEEE spec.
>>
>> All the good programmers I've ever worked with are aware of the  
>> basic problems with floats but almost always use them when they  
>> mean "any real number" anyway.
>
> But the systems don't respect that.

The systems respect their own semantics, and as a programmer I  
determined that the needs I had for reals were served sufficiently  
well by the language semantics for floats. And occasionally a tiny bit  
of extra code (i.e. turning a few equality tests into distance tests).

> Ye old Pascal used the name "real" for binary floats, but nobody  
> would go for that today. People certainly think there are bounds, at  
> least, and they are aware of the exactness problems.
>
>> The mental model I use, and that I encouraged among junior  
>> programmers, was that floats were "real numbers, but assume that  
>> they wiggle around a little all the time". Not technically correct,  
>> but a safe and useful mental model for programming.
>
> For some programming. For lots, not (since the error ranges can go  
> *really* wide unexpectedly). For lots of things this does matter  
> quite a bit. For other things it doesn't.
>
>> The point being that density of the number line is *not* an issue  
>> programmers encounter as a matter of course, and one for which  
>> their natural intuition might well be wrong.
>
> That's true. Bounds and inexactness are more obvious because of the  
> standard rounding action of float operations. But we aren't doing  
> rounding here, we are (potentially) doing counting. I agree that  
> people's intuitions are bad about counting, but I don't think it's  
> that hard to grasp that floats are more like an enumeration of  
> integers. It has to be explained of course.

To what end do we want to explain this? Help me finish this  
conversation:

user: Why is this KB inconsistent?
me: You've said that this needs to be a float, and there's this  
cardinality restriction, and there are only so many floats.
user: When I wrote float I just meant a number. Isn't that obvious?
me: Well, yes, it is obvious, but it's possible you meant something  
else, and the spec says...
user: But if it's completely obvious what I meant, then why didn't the  
system just do what I meant?
me: ...

In almost all cases we know that the user doesn't mean what he wrote.  
So why would we pass it through, produce a bug, and then try to teach  
the user the crazy semantics she never actually wanted to begin with?

>>>>> No user ever intended to restrict the semantic space to a  
>>>>> nowhere-dense number line. If the OWL spec presupposes that most  
>>>>> of our users would a prefer a number line which does not include  
>>>>> 1/3, my choice as an implementor would be to once again ignore  
>>>>> the spec and be intentionally non-compliant.
>>>
>>> An alternative choice is to signal that a repair is required and  
>>> perhaps do it.
>>
>> I hereby signal that a repair to the OWL spec is required.
>
> I meant repair of the ontology :)

I didn't support `xsd:decimal`.

>> (Are we really pretending that everybody thought datatypes in OWL  
>> 1.0 were fine and dandy?)
>
> Not at all. We're trying to do a much better job here. The design  
> choice we're faced with is whether to include floats at all (as  
> required), and if so, exactly what to spec them as. We'll include an  
> owl:real type (I hope) which will *really* be the reals, and perhaps  
> require decimal as well. A lot of the time, programmers use floats  
> for reals because there is no other choice (or they think  
> computation performances is critical). We're in a somewhat different  
> situation.
>
> Clear education material is definitely needed.
>
>>>>> Doing what all my users want and expect in this case turns out  
>>>>> to be way way easier than doing what a broken spec would  
>>>>> require. Any working group who would produce such a spec would  
>>>>> clearly be putting their own interests (ease of spec authoring  
>>>>> and political considerations) above their duty to their intended  
>>>>> users.
>>>
>>> I think your rhetoric flew ahead of reality here. It's not  
>>> actually easier to spec this (as the ongoing battle has shown :)).  
>>> As you well know, it's much easier to give in to Boris than not  
>>> to :) I don't believe I'm particularly motivated by political  
>>> considerations per se. I do think that departing from existing  
>>> behavior (disjointness) and normal meaning (in computer science)  
>>> needs to be done carefully.
>>
>> Let me expand upon my rhetoric:
>>
>> 1. Users want a (dense) real number line.
>
> Agreed. (But we have decimal and are going to offer real.)
>
>> 2. Users expect a (dense) real number line when they write  
>> `xsd:float` in OWL 1.0 ontologies.
>
> Unclear to me. Further, it's unclear to me whether we should respect  
> that or work against it.

This is an easy point for the WG to establish. Grab a whole load of  
OWL 1.0 ontologies that use `xsd:float` without values, track down the  
authors, and ask them. Absence cajoling from the interrogator, I'm  
willing to bet big money on the results.

>> 3. OWL 1.0 implementations reason as though the `xsd:float` value  
>> space is dense.
>> 4. The OWL 1.0 specifications state that the `xsd:float` value  
>> space is nowhere-dense.
>
> By reference to xsd, yes.
>
>> If you disagree about the first two points then it's certainly  
>> worth discussion: Alan's [investigation](http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0103.html 
>> ) seems to support my experience on point 1.
>
> See above.
>
>> I have yet to see a single counter-example to point 2---and I've  
>> asked many users what they meant when they wrote their datatype  
>> restrictions.
>
> Let me stipulate that for a minute. I trust you would contend this  
> for programming languages too.

I don't contend that for programming languages at all. Programmers  
understand that their data types require concrete representation, and  
there is no reasonable concrete representation for all reals.

OWL does *not* need concrete representations to reason about data types.

> But programming languages don't silently substitute a dense value  
> space for floats. I would contend that most people expect exact  
> computations from their reals too, but again, programming languages  
> don't do that (though, amazingly to me, calculator do! I did some  
> testing last year and many even simple online calulators use reals  
> internally so you can go from 1/3 * 3 and get 1 again.)

Double or nothing on the last bet that they don't use reals  
internally. Rationals, maybe. Be clear: OWL can handle the real value  
space. Standard programming languages cannot manipulate arbitrary reals.

> There are lots of features of OWL that aren't obvious to many users  
> (the open world assumption, the unique name assumption). This isn't  
> a reason to blithely ignore user instincts, of course. Far from it!  
> But it is not immediate.

OWL includes OWA and discards UNA because its authors assume that most  
of its users will benefit from these choices. In those cases user  
needs and user expectations seem to be at odds, so hard choices need  
to be made. If users both want and expect real numbers you'd be crazy  
to do anything else.

[snip]
>>> (Note that this will likely be the algebraic reals and only  
>>> rational constants. So, no transcendentals. I'd be interested in  
>>> your view on that. I can imagine adding the trans. but would  
>>> prefer to defer it until a later iteration.)
>>
>> This getting ridiculous---so you're saying you think there is a  
>> substantial user base who need to be able to specify that a value  
>> is the solution to some algebraic equation?
>
> Yes. Remember that we are (considering) adding linear and perhaps  
> polynomial inequations. This has been requested by users (including  
> those working on commerical projects, see:
> 	http://www.w3.org/2007/OWL/wiki/N-ary_Data_predicate_use_case
> for some examples)

It would be crazy to make that stuff a core part of OWL.

> If we rule out the irrationals, then we have intuitively solvable  
> equations which are not solvable in the rationals.
>
> For writing down values, it seems that the rationals are enough for  
> a wide range of cases where you are looking for a dense line. Until  
> you expand the range of constants (usually with equations of some  
> form) it's hard to see the utility to users of additional reals  
> (e.g., the square root of two isn't really a constant, but a radical).
>
>> I have absolutely no idea what perspective the working group is  
>> taking here---what implementor
>
> RacerPro already has some form of linear inequations over the  
> algebraic reals (and, I believe, the algebraic complex numbers). FaCT 
> ++ and Pellet developers have indicated that they have interest in  
> implementation. We have several classes of users (some of which are  
> represented on the page above.)
>
> The working group does not yet have consensus on these features. It  
> also does not have a complete design.
>
>> or user has expressed interest in anything other than the real  
>> number line???
>
> The algebraic reals just are the above (i.e., the solutions to  
> polynomials with rational coefficients). The transcendental reals  
> are still more. Without transcendental coefficents or constants, you  
> can't "get" to them. So, practically speaking, it makes no  
> difference to the consistency of any knowledge base whether we spec  
> the type as being all reals or being algebraic reals.
>
> I personally would like to separate them to make augmenting the  
> lexical space a bit easier in the future. That is, if we're not  
> going to have transcendental constants, I think it makes sense to  
> *call* the type we're introducing something like "algebraic  
> reals" (and even restrict the value space). That leaves it open for  
> a future group to introduce the reals as a super type (both in the  
> lexical and in the value spaces).

But why on earth do we need special value spaces for *any* of these  
sets? Once again, what percentage of users is going to want to say  
"number that can be expressed as some rational"? Surely all these  
wacky numerics just provide more evidence than unadorned OWL  
ontologies should just be referencing the entire real number line  
instead of "accidentally" restricting it to some counter-intuitive  
subset!

>> Can you guys please just come up with a version of the [`numeric`](http://www.w3.org/TR/xmlschema-2/#rf-numeric 
>> ) notion? Pretty please?
>
> That's interesting and potentially helpful for some things (like  
> spliting off the strings from the numbers in general), but I don't  
> see it helps with our current situation.
>
>>>>> I don't particularly care what each of these three is called; as  
>>>>> long as OWL specifies the internal semantics of these three  
>>>>> types of spaces, then it's straightforward to "implement" the  
>>>>> datatypes users will actually want in terms of them. But, of  
>>>>> course, the ability to use XML Schema Datatypes to encode  
>>>>> specific values within each of these spaces would be quite  
>>>>> convenient
>>>
>>> Do you mean the lexical spaces?
>>
>> I mean the only time I explicitly want XML Schema is when my  
>> implementation is parsing specific values provided by the user. If  
>> you happen to re-use the XML Schema spec for other things that is  
>> for your own convenience, not mine.
>
> So what is your position on user defined types? We reuse that from  
> XSD too.

No use of XSD to specify value spaces at all. Only particular values.

> The core types, and even things like float, seem coherent from an  
> OWL point of view (excepting certain facets and corner issues). That  
> is, you can coherently reason over them.

The fact that a system is internally coherent is a pretty low bar. The  
goal should be a semantic model which matches user needs and  
expectations. Not always possible, but `float` and `rational` value  
spaces seem to be headed in exactly the opposite direction.

>>>>> ---and would use the XML Schema specification for *exactly* what  
>>>>> it's good at.
>>>
>>> The additional question is whether to require additional types  
>>> that are not the above three. Among these are float and double. My  
>>> belief is that if we are going to add such datatypes as required,  
>>> and we are going to take them from xsd, then they should reflect  
>>> the semantics of those types and our advice to users is to only  
>>> use them if they specifically intend those semantics.
>>
>> I'd guess that using xsd names for value spaces will just (continue  
>> to) confuse users.
>
> Seems so.
>
>> More importantly, and yet again, I have never ever encountered a  
>> user who would prefer to use the `float` or `double` value spaces  
>> if a `real` value space were available.
>
> But that suggests (to me) we not provide the float or double types.

For value spaces, I agree.

>> If there are users who feel the other way, then please produce  
>> them---merely hypothesizing their theoretical existence does not  
>> seem useful. (I grant that the class is satisfiable. I contend that  
>> its size is vanishingly small in practice.)
>
> Again, there are many aspects of the types, e.g., disjointness,  
> size, NaN, lexical space, and discreteness. As far as I can tell,  
> users have picked on several of these (while champions have claimed  
> that some parts that other people have dismissed are critical).
>
> That all being said, my personal concern with "getting floats right"  
> involve future, hypothetical use. Which is a big fucking weakness of  
> my position. But I would prefer not to require them at all than to  
> require them with "wrong" semantics. I would prefer directing people  
> who want a real type to the real type. I think that is generally  
> better for a number of reasons, including education. It's rather odd  
> to introduce primitive types with different names with no different  
> semantics not even *intended* different semantics.
>
> Oh, of course, one reason is if the lexical spaces are different. I  
> don't have a problem giving the lexical space of our real a lot of  
> lexical freedom (in the initial proposal we suggested a fraction  
> like syntax, but we could add all sorts of variants; but you have to  
> be careful because other syntaxes sometimes require infinite  
> expansions).
>
>>> The n-ary predicate definition system will, at most, be over the  
>>> core three types above (e.g., polynomial or linear inequations  
>>> with rational coefficients over the reals ). However, one can  
>>> pretty easily imagine a predicate definition system that was  
>>> focused on the floats and was sensitive to the various semantics.  
>>> It wouldn't have to be direct floating point based equations, but  
>>> an interval arithmetic system which was designed to help reason  
>>> about measurements and computations (and their errors).
>>
>> I care not a whit for n-ary datatypes. I might implement them if  
>> they're in the spec; I might not. But if the spec says you need to  
>> use n-ary datatypes to get real numbers,
>
> No no no. The real datatype will be available, simpliciter. It's use  
> in n-ary is an *additional* motivation for it.
>
>> and leaves the issues raised with the `float` value space in place,
>
> ?
>
>> I will ignore the spec and implement the real number line for unary  
>> datatypes.
>
> With transcendentals? With what lexical space?

In theory, yes. My documentation will reference the whole real number  
line. But my parser will probably only handle `xsd:float` and  
`xsd:double` for values.

>> Just like I did for OWL 1.0. As a member of the public, that is my  
>> feedback to the working group.
>
> Thanks! Please not that all this is not settled yet.
>
>
>>> I grant entirely that that use case is quite speculative at the  
>>> moment. But given that 1) we have alternatives for the "any  
>>> number" type and 2) cardinality reasoning with the floats is not  
>>> very much more difficult that with user defined finite ranges over  
>>> the integers (except for the fact that users have to do much more  
>>> work to get there), I don't think we should muck with the  
>>> semantics of floats.
>>
>> I strongly disagree with 2.
>
> Really? <http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm 
> >
>
> There's some special casing for various "odd" bits (NaN, etc.) but  
> this shows that sizing float ranges can be reduced to sizing integer  
> ranges. Thus, it's not fundamentally different.
>
>> I don't want my implementation to care about the difference between  
>> `double` and `float`,
>
> So you want them exactly identical?
>
>> and I consider any line of code I write involving the internals of  
>> float representation to be a wasted line of code, because my users  
>> really don't care.
>>
>> Much more importantly, it's my job to turn your spec into user- 
>> facing documentation
>
> This is my job too :) Both inside and outside the working group.
>
>> and support, and there is not a chance in hell I'm going to explain  
>> this issue to my users. They don't care, and they don't want the  
>> semantics you are describing. Experience with OWL 1.0 has  
>> demonstrated this.
> [snip]
>
> Can you say exactly what the semantics are you want? I get that you  
> want them dense (and think that I'm dense :)). But I'm unclear on:
> 	disjointness (from each other and from decimal and its subtypes  
> like integer)

You keep referring to the value spaces specified in xsd. I don't care  
about those value spaces. I don't think they are relevant.

But integers should probably live on the same number line as the rest  
of the reals.

> 	range

Infinite in both directions for the number line, but if particular  
values can only be specified with xsd datatypes then users will only  
be able to specify particular values within some range.
For integers, I'd support limiting particular values to `xsd:long`  
(and would consider a spec which only required `xsd:int` reasonable).  
If you required support for any `xsd:Integer` I probably wouldn't  
implement it unless there was great user demand.

> 	NaN like constants

I don't have a strong opinion; no contact with users who have NaN needs.

> Thanks for the feedback.
>
> Cheers,
> Bijan.

And if you're going to request further comment from a member of the  
public, could you please do it on a list to which the public can post?  
Shifting back to the WG list excludes me from comment. (Which is fine  
if you don't address questions directly to me.)

-rob
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Saturday, 5 July 2008 12:05:35 UTC