Re: Question about number types from Dave Peterson on 2008-07-04 (www-xml-schema-comments@w3.org from July to September 2008)

From: Dave Peterson <davep@iit.edu>
Date: Thu, 3 Jul 2008 21:39:26 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: www-xml-schema-comments@w3.org
Message-Id: <a0624080ec49305efcabe@[192.168.1.100]>
At 1:27 AM -0400 2008-07-03, Alan Ruttenberg wrote:
>On Jul 2, 2008, at 5:25 PM, Dave Peterson wrote:
>
>>At 2:18 PM -0400 2008-07-02, Alan Ruttenberg wrote:
>>>Was there at any point explicit rejection of 
>>>have a mathematical real number datatype 
>>>(possibly augmented with some constants such 
>>>as INF, -INF) from which the rest of the 
>>>numeric types were defined by restriction?
>>
>>Yes.  There is a potential hierarchy of complex numbers, real numbers,
>>and rational numbers that could exist above our decimal datatype.  Not
>>many systems implement them, and conformance to our spec requires that
>>all datatypes be implemented, at least enough to pass from one system
>>to another.  So we didn't include them.  But I think what you probably
>>want is to derive float and double from decimal.
>
>Well, figuring out what is wanted is certainly a 
>goal I have. Ideally there is a coherent theory 
>that make such choices obvious so that want 
>isn't part of the equation. Currently my concern 
>is that the theory is not coherent enough.

Would that there were a coherent theory.  There are choices to be made,
because not all the desirable properties of equality and order can be
retained.  Some folk consider some properties most important; other
folk want to retain a different set.

>>                                                   The problem with that
>>is that such a derivation would violate a fundamental property that
>>we wanted derivation to have:  If a value is removed from the value
>>space during a derivation, that automatically removes its lexical
>>representations from the lexical space.  However, float and double
>>require that strings that exactly represent a decimal value not in
>>the float or double value space be mapped to the nearest value that
>>is in the lexical space.
>>
>>Rather than remove that fundamental property of derivation, we decided
>>to leave float and double as separate primitives.
>
>Perhaps this is a stupid question, but why is 
>this a fundamental property of derivation? One 
>generally thinks of types in terms of subset 
>relations. The primary reason to have any of the 
>number types is to represent numbers. Therefore, 
>I would think that the fundamental way to 
>organize number types is by way deciding which 
>are subsets of another. There *is* an 
>interesting wrinkle to the floats, namely that 
>they have some non-numeric lexical values. The 
>easiest way to handle this would be have the 
>upper level number types also have these values.

Actually, that "interesting wrinkle" would be trivial to add--in fact it
was considered; the problems are elsewhere.

You say you think in terms of subset relations.  Each datatype consists of
a value space, a lexical space, a lexical mapping between them, and some
operations and/or relations defined on the value space.

You can't look *just* at the value space.  If all you see is a set, no
operations or relations, etc., the only thing you can do with it is
to determine its cardinality.  This is not really a useful view of
datatypes.

So you have to include the appropriate operations and/or relations,
and the lexical mapping.  (Normally the value space is exactly the
domain of that mapping, and the lexical space is the range.)

Whether it is important that a the value of a literal under the lexical
mapping remain unchanged when a new datatype is derived from an existing
one is a choice one must make.  The WG considered it and decided they
wanted that to be fundamental.  At one point a special facet was proposed
that would cause otherwise-dropped lexical representations to be given
new values; it was considered and rejected.  I'm not privy to just wnat
went on in the minds of each WG member to cause that decision.

>>Trying to define equality (for example) across, say, decimal and float,
>>leads to its own problems:  In float, 0.1 and 0.10000000009 are the
>>same number (exactly 0.100000001490116119384765625, i believe).  In
>>decimal, they are different.
>
>Equality is defined in the value space. I don't 
>see anything in the specification (nor would I 
>expect to see) a rule of the sort that says what 
>you have outlined above:
>
>if w and x are different literals in type a and
>y and z are different literals in type b and
>w is the same literal as y and x is the same literal as z
>and w and x are different
>then
>either w and x have the same value and y and z have the same value or
>w and x have different values and y and z have different values

Well, you won't see any such thing in the Spec since it doesn't permit such
comparisons.  I was trying to show a result that some people don't like,
if such comparisons are possible, and made the same way.

>>  (Both 0.1 and 0.10000000009 are in the
>>value space of decimal, but neither is in the value space of float.
>
>You are writing things a bit loose here, not 
>clearly distinguishing between when you are 
>talking about a value and a literal.

Well, actually I was *using* the literals to talk about values.  In general,
we expect to use the same character strings in our metalanguage (the English
in which the Spec is written) that we use in the lexical spaces when we
want to talk about values in the value space.  So, I might *use* either
'0.1', '0.10000000009', or '0.100000001490116119384765625' in my English
discussion to describe one particular value in the float value space.
The spec, of course, generally avoids talking about values from different
primitive datatypes, so it has no short notation for doing so.  I created
the notation of a value name followed by a parenthesized datatype name
just for the purpose of this discussion.  So I can say equivalently that
0.1(float) and 0.10000000009(float) are equal or that '0.1(float) =
0.10000000009(float)' is true.

>However, let me make two observations
>
>1) By the definitions in the specification  *no* 
>value in the value in the value space of decimal 
>is in the value space of float. That's because 
>both are primitive and "the ·value space·s of 
>all ·primitive· datatypes are disjoint (they do 
>not share any values)"

True, in the Spec.  But this discussion is about the possibility that we
change that.  So I have to talk about that "what if"--assume they aren't
disjoint, and see where that leads us.

>2) That you inadvertently make the comparison 
>emphasizes the point that floats and decimals 
>*are* comparable.

There's no inherent "are"; only an inherent "can be".  Agreed, they can be
comparable--a definition of equality and order relations can be made.  So
our object in this discussion is to see what some of the consequences

>When I said above that I worry that the theory 
>is not coherent, it is the absence of any 
>explanation within the specification of how such 
>a comparison could be made that forms part of 
>such a concern.

I don't see that its the job of the Spec to explain how all the alternate
choices could be made.  Enough that it explains the choices it does make
and perhaps points out some things that follow from those choices.

>>Because we round exact values of literals whose exact value is not
>>in the float value space, the float values 0.1 and 0.10000000009 are
>>equal.
>
>0.1 (float)  =  0.10000000009 (float)
>This is true because equality is defined in the 
>value space, because the mapping from lexical to 
>value is well defined, and because when the 
>mapping is applied, the values are found to be 
>the same.
>
>>We would expect the statement '0.1 = 0.10000000009' to be
>>true.  On the other hand, '0.1 = 0.10000000009' is false in the
>>decimal datatype.
>
>0.1 (decimal) !=  0.10000000009 (decimal)
>
>>If we allow comparison across the two datatypes,
>>
>>  o  '0.1(float) = 0.1(decimal)' presumably true,
>
>'0.1(float) = 0.1(decimal)' should not be 
>presumed true. Someone who presumed this would 
>neither have understood the specification, and 
>not understand the way floating point numbers 
>work.

Well the Spec says no cross-primitive comparisons will be true.  But we're
trying to discuss an alternative to the choice made in the spec.

Nonetheless, it's a choice to be made whether 0.1(float) = 0.1(decimal)
if you're going to make float and decimal values comparable.

>According to the specification, one makes such 
>comparison in the value space, not the lexical 
>space. In a program, the compiler would read the 
>float and create a machine representation of it 
>(as a float) and a different representation for 
>the decimal. To make the comparison accurately 
>it would need to convert both machine 
>representations to one that could exactly 
>represent each, and then compare them. The 
>result of such an operation would show that 
>there was no equality.

As I said, that's a choice to make.  I agree they had better not be
considered *identical*, but we do allow distinct values to be *equal*.
In fact, although I'm not an XSLT/XQuery expert like Mike Kays, I
believe that XSLT/XQuery's definition of cross-primitive comparison
*does* make them equal.  You choose one way, they chose another.

>>  o  '0.1(decimal) = 0.10000000009(decimal)' false,
>
>One is tempted to say, oh yeah, easy: They are 
>the same type so we can just compare the lexical 
>representations and if they differ they are not 
>the same.
>
>However this logical is incorrect. That's 
>because  '00.1(decimal) = 0.1(decimal) is false' 
>would be an error. Comparisons for equality need 
>to happen, for a given datatype, either between 
>the values, or between *canonical* lexical 
>representations (which can be proven to give the 
>same value by the nature of the 1:1 mapping 
>between canonical lexical representations and 
>values).

I think we're in violent agreement there.

>>  o  '0.10000000009(decimal) = 0.10000000009(float)' presumably true
>
>Not presumably true. See above.
>
>>But from that (a = b != c = d) we can conclude (a != d), i.e.
>>'0.1(float) = 0.10000000009(float)' is false.
>
>GIGO
>A familiar experience for people who work with machine numerics.

One man's garbage....  But, as I said, I believe that's what you'll
get from XSLT/XQuery's version.  They've chosen to forgo the transitive
law.

>>I don't believe there is any way to make a meaningful equality across
>>float and double that retains the usual rules about equality (e.g.,
>>reflexive, symmetric, transitive) and allows you to compare other than
>>exact values.
>
>Each value of a decimal and float (aside from 
>Nan, +/-INF)  can be mapped unambiguously to a 
>real number. Those real numbers can be compared 
>for equality. What wrong with that?

Again, violent agreement.  "That" *is* comparing only exact values as
being equal.

>>Or do you want '0.1(float) = 0.1(decimal)' to be false?
>
>Yes, I want it to be false. Why? Because they do 
>not, in fact, represent the same number!
>
>>(So that the only decimal value equal to 
>>0.1(float) is 0.100000001490116119384765625)?
>
>Yes.

No problem; you're consistent.  But, as I said, other folks want
other strokes.

>>>We have a discussion going in the OWL working 
>>>group, part of which is about the desirability 
>>>of comparing a float to an integer. If they 
>>>are disjoint, then that doesn't seem possible. 
>>>However, it seems well defined to ask whether 
>>>"2.1"^^xsd:float > "2"^^xsd:int
>>
>>Easy to pick carefully selected values that make sense.  But can you
>>give me a way to filter out the ones where it doesn't make sense (as
>>in the equality example above)?
>
>It makes perfect sense in the example above when 
>you consider that all the numbers are 
>representations of real numbers.
>
>>If you go by exact values, then 2.1(float) > 
>>2.10000000009(decimal).  That doesn't seem very 
>>intuitive to me.
>
>Your intuition is 1/2 right and 1/2  wrong. (A 
>common experience when dealing with floating 
>point numbers).
>
>The 1/2 right:  2.1 float is, by my reckoning[1] 
>exactly 2.099999904632568359375
>
>so 2.1(float) < 2.10000000009(decimal) and your 
>intuition is right, since the equation is wrong.

But of course, (since I misevaluated 2.1(float)), you still
get 2.1(float) > 2.09999991(decimal).  You don't care, but others balk.
That's one reason we say folks using these datatypes outside of XSD are
welcome to define their own operations and relations, and even redefine
the ones we require for Schema calculations.

>The 1/2 wrong: The idea that Intuition should be your guide.

A choice one must make.

>ps. Please consider this a formal comment on the 
>specification. If desired I can submit it to the 
>bug tracker.

As Mike said, you do need to submit it to bugzilla, as a proposed
change.  Be sure to state your reasons, and preferably why you think,
e.g., that your version is better than others.
-- 
Dave Peterson
SGMLWorks!

davep@iit.edu
Received on Friday, 4 July 2008 01:40:25 UTC