- From: Dave Peterson <davep@acm.org>
- Date: Thu, 24 Jan 2002 10:46:55 -0500
- To: zongaro@ca.ibm.com, www-xml-schema-comments@w3.org
- Cc: cmsmcq@acm.org, ashokma@microsoft.com, Paul.V.Biron@kp.org

Well...just goes to show what happens when it's late and you don't go back to original sources. (Just think, if we'd done that originally, we might not have gotten into this mess.) OK: I was tired. It was late. I don't have an electronic copy of 754 and couldn't find my paper copy. (BTW, Henry's msg calling me on this arrived while I was writing this one. So at least I'd caught it myself. For whatever that's worth.) At 10:47 PM -0500 1/23/02, I wrote: >At 4:58 PM -0500 1/23/02, zongaro@ca.ibm.com wrote: >>A small correction here - the maximum value of e in that formula is >>actually 104 (the minimum value is -149), so I believe the largest >>finite float value is 2^128 - 2^104, which is approximately >>3.4028x10^38. (I'll refer to it as M below.) > >Almost. Depending on how old that paper of mine was--there was a time when >I was confused about how 754 approached this matter. > o For normalized *positive* numbers, (since one bit is m's sign bit, > there are 23 left), this means > o 2*22 <= m <= 2*23 - 1 > > o For e = 0, this would result in normalized positive numbers of the > form m * 2**e running from 2**22 to 2**23 - 1 . Then varying e > equally on either side of 0 would bias things heavily in favor of > large numbers. > > o What they want is, for exponent zero, to have numbers close to and > just less than 1. This requires that you bias the exponent by -23. > > o This means the number represented by m and e is (m * 2**e * 2**-23) > >Therefore the largest number representable is > > (2**23 - 1) * 2**127 * 2**-23 > >which is > > (2**23 - 1) * 2**104, AKA (2**127 - 2**104) > >Net result is that Michael (and undoubtably me, back then) didn't bias the >exponent--and Michael, me back then, and Henry all failed to account for >the sign bit. I leave it to those with time and calculator packages to >work out the decimal representation. Well...I accounted for the sign bit. But (because I didn't reread the original and depended on stale memory) I didn't account for a trick of hiding an extra bit encoded into e: Note that the top bit, except for very small ("unnormalized" or "subnormalized") numbers, is always 1. Why store it? Its only function is to differentiate between normalized and subnormalized numbers. Subnormalized numbers can be detected by the exponent value. Therefore we really have 25 bits for m: One sign bit, one implicit bit, and 23 real bits, for a total of 24 non-sign bits. Hence 2*23 <= m <= 2*24 - 1 . Rerun the calculations, and see that Henry's value of 2**128 - 2**104 is correct. >Because of the bias, if you add more bits to the m (sorry, but it ain't a >mantissa) you also increase the bias, no net gain. To get larger numbers >you must increase the exponent's bits. Therefore, the next larger number >would be > > (2**22) * 2**(127 + 1) * 2**-23, AKA (2**127) And that becomes 2**128 . I think I've got it right this time. Sigh.... -- Dave Peterson SGMLWorks! davep@acm.org

Received on Thursday, 24 January 2002 10:45:52 UTC