- From: <zongaro@ca.ibm.com>
- Date: Thu, 24 Jan 2002 10:15:01 -0500
- To: Dave Peterson <davep@acm.org>
- Cc: www-xml-schema-comments@w3.org, cmsmcq@acm.org, ashokma@microsoft.com, Paul.V.Biron@kp.org
- Message-ID: <OF7FAB40E4.FE7B8604-ON85256B4B.004E9C73@torolab.ibm.com>
Hi Dave, A small quibble: the greatest, finite float value really is 2**128 - 2**104. Although IEEE 754 allows only 23 bits for the significand, one bit for the sign, and eight for the exponent, all normalized values have an implied MSB with a value of one. That means the significand (m) runs from 2**23 to 2**24-1. Hence, the greatest, finite value really is (2**24-1)*2**104 or 2**128-2**104. You also wrote that: [[ > I just noticed another problem with the specification of how >lexical values map to float values in the value space. The text >cited above assumes that, of two consecutive normalized values in >the value space, one will be even and the other will be odd. Yet another example of not-quite-precise writing, I'm afraid. We copied from Clinger who copied from (I assume) 754. Something got lost in translation. (Ever play the gossip game?) Note how each representable number has precisely one representation. (I'll leave it to you to check the details about subnormalized numbers-- what I'm about to say is for positive normalized.) One representable number and the next each have a particular m value; usually with the same e, but at the boundary one m is all 1 bits; the next up (larger e) is all 0 bits except the top-most bit. In the first case, the higher m is 1 more than the lower, so one number's m is odd and the other's is even; in the second case, the lower is necessarily odd and the next is even. It's the even-ness of the integer m (the "radicand", incorrectly called the "mantissa") that decides, not that of the number represented. ]] I agree that IEEE 754 allows for only one representation of each finite value - it's true of both normalized and subnormalized values. My concern was with 3.2.4 of Datatypes, which states that the value space consists of the values "m*2^e, where m is an integer whose absolute value is less than 2^24, and e is an integer between -149 and 104." The existing description allows for more than one way of expressing most values, and hence it would be impossible to use the fact that m is odd or even to determine the direction for rounding with the existing description of the value space. A description that limited m to the range [2**23,2**24-1], as you've described below, is necessary. As long as we're on the topic of subnormalized numbers, it's not clear to me whether they were intended to be part of the value space. The description of the value space I've quoted above admits the subnormalized values to the value space - they are those values for which e=-149 and 0 < m < 2**23. However, literals in the lexical space map to the closest *normalized* value. That would mean that there are values in the value space to which no value in the lexical space will map. Any revision of the description of float needs to answer this question. Thanks, Henry ------------------------------------------------------------------ Henry Zongaro XML Parsers development IBM SWS Toronto Lab Tie Line 969-6044; Phone (905) 413-6044 mailto:zongaro@ca.ibm.com To: Henry Zongaro/Toronto/IBM@IBMCA, www-xml-schema-comments@w3.org cc: cmsmcq@acm.org, ashokma@microsoft.com, Paul.V.Biron@kp.org Subject: Re: largest finite float At 4:58 PM -0500 1/23/02, zongaro@ca.ibm.com wrote: >C.M. Sperberg-McQueen wrote: >[[ >The largest finite float, if I understand the notes correctly, is > > m * 2**e > >where ** means exponentiation, > m is the largest number representable in the mantissa, and > e is the largest number representable as an exponent [TERMINOLOGY NOTE: m in this representation is *not* properly called the mantissa. "Significand" is better.] >Since we have > > m = 2 ** 24 - 1 > e = 127 > >it follows that > > m * 2**e = (2**24) * (2**127) - 2**127 > = 2**151 - 2**127 > >]] > >A small correction here - the maximum value of e in that formula is >actually 104 (the minimum value is -149), so I believe the largest >finite float value is 2^128 - 2^104, which is approximately >3.4028x10^38. (I'll refer to it as M below.) Almost. Depending on how old that paper of mine was--there was a time when I was confused about how 754 approached this matter. o For float, one uses 8 bits for e and 24 for m. o For "normalized" numbers, the top bit of m is on. o For normalized *positive* numbers, (since one bit is m's sign bit, there are 23 left), this means o -127 <= e <= 127 (All bits on, -128, is reserved for signalling NaNs and infinities.) o 2*22 <= m <= 2*23 - 1 o For e = 0, this would result in normalized positive numbers of the form m * 2**e running from 2**22 to 2**23 - 1 . Then varying e equally on either side of 0 would bias things heavily in favor of large numbers. o What they want is, for exponent zero, to have numbers close to and just less than 1. This requires that you bias the exponent by -23. o This means the number represented by m and e is (m * 2**e * 2**-23) Therefore the largest number representable is (2**23 - 1) * 2**127 * 2**-23 which is (2**23 - 1) * 2**104, AKA (2**127 - 2**104) Net result is that Michael (and undoubtably me, back then) didn't bias the exponent--and Michael, me back then, and Henry all failed to account for the sign bit. I leave it to those with time and calculator packages to work out the decimal representation. Henry continued, again quoting Michael: >[[ >Some things are, unfortunately, not so clear to me: > > (a) what the next largest float would be if we had one > (b) where the watershed point is between infinity and 2.85...E45 > (c) whether the negative numbers are exactly the same as these > plus a minus sign, or divergent in some way > >I don't believe there has been any confusion over the watershed >between zero and the smallest representable float. > >Dave, if you can confirm that I have correctly interpreted your >notes, I'd be grateful. Ditto if anyone can shed light on questions >(a), (b), (c) above. >]] > > Regarding question (a), I'm not sure if there is a sensible >answer to that. To have a next larger finite float value, you'd >have to have either more bits in the mantissa or more bits in the >exponent - which you choose determines what would be the next larger >float value. Because of the bias, if you add more bits to the m (sorry, but it ain't a mantissa) you also increase the bias, no net gain. To get larger numbers you must increase the exponent's bits. Therefore, the next larger number would be (2**22) * 2**(127 + 1) * 2**-23, AKA (2**127) >[[ >A literal in the ><http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#dt-lexical-space>·lexical >space· representing a decimal number d maps to the normalized value >in the ><http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#dt-value-space>·value >space· of float that is closest to d in the sense defined by ><http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#clinger1990>[Clinger, >WD (1990)]; if d is exactly halfway between two such values then the >even value is chosen. >]] > > A literal reading of that text would have any lexical value >representing a decimal number greater than M map to M, because >that's the closest normalized value in the value space. So, for >instance, 1.0E+100000 would map to M. I believe that behaviour >would be contrary to the expectations of most users. Quite so. If I understand 754 correctly, the round-off algorithm is first described in an "arbitrary integer exponent" model (i.e., as large an exponent as you need, for this case). So, for a given number of bits for the m, every scientific-decimal numeral maps to some number by the algorithm. If that number requires a larger integer exponent than is available, then it is forcibly mapped to infinity. So just as the cutoff between the last two representable numbers is half-way between the two, for rounding purposes, the cutoff above the last representable number is half way between that number and the "next" one that would be representable with a larger exponent. For float, that means half way between 2**127 - 2**104 and 2**127. > I just noticed another problem with the specification of how >lexical values map to float values in the value space. The text >cited above assumes that, of two consecutive normalized values in >the value space, one will be even and the other will be odd. Yet another example of not-quite-precise writing, I'm afraid. We copied from Clinger who copied from (I assume) 754. Something got lost in translation. (Ever play the gossip game?) Note how each representable number has precisely one representation. (I'll leave it to you to check the details about subnormalized numbers-- what I'm about to say is for positive normalized.) One representable number and the next each have a particular m value; usually with the same e, but at the boundary one m is all 1 bits; the next up (larger e) is all 0 bits except the top-most bit. In the first case, the higher m is 1 more than the lower, so one number's m is odd and the other's is even; in the second case, the lower is necessarily odd and the next is even. It's the even-ness of the integer m (the "radicand", incorrectly called the "mantissa") that decides, not that of the number represented. Hope this all helps. Michael, I'm too tired to check out negatives. I'm pretty sure they're symmetric. I can't imagine any reason they wouldn't be. -- Dave Peterson SGMLWorks! davep@acm.org
Received on Thursday, 24 January 2002 10:15:21 UTC