Re: Question about number types from Alan Ruttenberg on 2008-07-03 (www-xml-schema-comments@w3.org from July to September 2008)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Thu, 3 Jul 2008 01:27:23 -0400
To: Dave Peterson <davep@iit.edu>
Cc: www-xml-schema-comments@w3.org
Message-Id: <E854C893-6A19-4CCE-A7D0-079331A41ABC@gmail.com>
On Jul 2, 2008, at 5:25 PM, Dave Peterson wrote:

> At 2:18 PM -0400 2008-07-02, Alan Ruttenberg wrote:
>> Was there at any point explicit rejection of have a mathematical  
>> real number datatype (possibly augmented with some constants such  
>> as INF, -INF) from which the rest of the numeric types were  
>> defined by restriction?
>
> Yes.  There is a potential hierarchy of complex numbers, real numbers,
> and rational numbers that could exist above our decimal datatype.  Not
> many systems implement them, and conformance to our spec requires that
> all datatypes be implemented, at least enough to pass from one system
> to another.  So we didn't include them.  But I think what you probably
> want is to derive float and double from decimal.

Well, figuring out what is wanted is certainly a goal I have. Ideally  
there is a coherent theory that make such choices obvious so that  
want isn't part of the equation. Currently my concern is that the  
theory is not coherent enough.

> The problem with that  is that such a derivation would violate a  
> fundamental property that
> we wanted derivation to have:  If a value is removed from the value
> space during a derivation, that automatically removes its lexical
> representations from the lexical space.  However, float and double
> require that strings that exactly represent a decimal value not in
> the float or double value space be mapped to the nearest value that
> is in the lexical space.
>
> Rather than remove that fundamental property of derivation, we decided
> to leave float and double as separate primitives.

Perhaps this is a stupid question, but why is this a fundamental  
property of derivation? One generally thinks of types in terms of  
subset relations. The primary reason to have any of the number types  
is to represent numbers. Therefore, I would think that the  
fundamental way to organize number types is by way deciding which are  
subsets of another. There *is* an interesting wrinkle to the floats,  
namely that they have some non-numeric lexical values. The easiest  
way to handle this would be have the upper level number types also  
have these values.

> Trying to define equality (for example) across, say, decimal and  
> float,
> leads to its own problems:  In float, 0.1 and 0.10000000009 are the
> same number (exactly 0.100000001490116119384765625, i believe).  In
> decimal, they are different.

Equality is defined in the value space. I don't see anything in the  
specification (nor would I expect to see) a rule of the sort that  
says what you have outlined above:

if w and x are different literals in type a and
y and z are different literals in type b and
w is the same literal as y and x is the same literal as z
and w and x are different
then
either w and x have the same value and y and z have the same value or
w and x have different values and y and z have different values

>  (Both 0.1 and 0.10000000009 are in the
> value space of decimal, but neither is in the value space of float.

You are writing things a bit loose here, not clearly distinguishing  
between when you are talking about a value and a literal. However,  
let me make two observations

1) By the definitions in the specification  *no* value in the value  
in the value space of decimal is in the value space of float. That's  
because both are primitive and "the ·value space·s of all ·primitive·  
datatypes are disjoint (they do not share any values)"

2) That you inadvertently make the comparison emphasizes the point  
that floats and decimals *are* comparable. When I said above that I  
worry that the theory is not coherent, it is the absence of any  
explanation within the specification of how such a comparison could  
be made that forms part of such a concern.

> Because we round exact values of literals whose exact value is not
> in the float value space, the float values 0.1 and 0.10000000009 are
> equal.

0.1 (float)  =  0.10000000009 (float)
This is true because equality is defined in the value space, because  
the mapping from lexical to value is well defined, and because when  
the mapping is applied, the values are found to be the same.

> We would expect the statement '0.1 = 0.10000000009' to be
> true.  On the other hand, '0.1 = 0.10000000009' is false in the
> decimal datatype.

0.1 (decimal) !=  0.10000000009 (decimal)

> If we allow comparison across the two datatypes,
>
>  o  '0.1(float) = 0.1(decimal)' presumably true,

'0.1(float) = 0.1(decimal)' should not be presumed true. Someone who  
presumed this would neither have understood the specification, and  
not understand the way floating point numbers work.

According to the specification, one makes such comparison in the  
value space, not the lexical space. In a program, the compiler would  
read the float and create a machine representation of it (as a float)  
and a different representation for the decimal. To make the  
comparison accurately it would need to convert both machine  
representations to one that could exactly represent each, and then  
compare them. The result of such an operation would show that there  
was no equality.

>  o  '0.1(decimal) = 0.10000000009(decimal)' false,

One is tempted to say, oh yeah, easy: They are the same type so we  
can just compare the lexical representations and if they differ they  
are not the same.

However this logical is incorrect. That's because  '00.1(decimal) =  
0.1(decimal) is false' would be an error. Comparisons for equality  
need to happen, for a given datatype, either between the values, or  
between *canonical* lexical representations (which can be proven to  
give the same value by the nature of the 1:1 mapping between  
canonical lexical representations and values).

>  o  '0.10000000009(decimal) = 0.10000000009(float)' presumably true

Not presumably true. See above.

> But from that (a = b != c = d) we can conclude (a != d), i.e.
> '0.1(float) = 0.10000000009(float)' is false.

GIGO
A familiar experience for people who work with machine numerics.

> I don't believe there is any way to make a meaningful equality across
> float and double that retains the usual rules about equality (e.g.,
> reflexive, symmetric, transitive) and allows you to compare other than
> exact values.

Each value of a decimal and float (aside from Nan, +/-INF)  can be  
mapped unambiguously to a real number. Those real numbers can be  
compared for equality. What wrong with that?

> Or do you want '0.1(float) = 0.1(decimal)' to be false?

Yes, I want it to be false. Why? Because they do not, in fact,  
represent the same number!

> (So that the only decimal value equal to 0.1(float) is  
> 0.100000001490116119384765625)?

Yes.

>> We have a discussion going in the OWL working group, part of which  
>> is about the desirability of comparing a float to an integer. If  
>> they are disjoint, then that doesn't seem possible. However, it  
>> seems well defined to ask whether "2.1"^^xsd:float > "2"^^xsd:int
>
> Easy to pick carefully selected values that make sense.  But can you
> give me a way to filter out the ones where it doesn't make sense (as
> in the equality example above)?

It makes perfect sense in the example above when you consider that  
all the numbers are representations of real numbers.

> If you go by exact values, then 2.1(float) > 2.10000000009 
> (decimal).  That doesn't seem very intuitive to me.

Your intuition is 1/2 right and 1/2  wrong. (A common experience when  
dealing with floating point numbers).

The 1/2 right:  2.1 float is, by my reckoning[1] exactly  
2.099999904632568359375

so 2.1(float) < 2.10000000009(decimal) and your intuition is right,  
since the equation is wrong.

The 1/2 wrong: The idea that Intuition should be your guide.

Regards,
Alan

[1] http://svn.mumble.net:8080/svn/lsw/trunk/util/float-exact.lisp  
(pardon my syntax)

ps. Please consider this a formal comment on the specification. If  
desired I can submit it to the bug tracker.
Received on Thursday, 3 July 2008 05:28:17 UTC