Re: Floating point proposal from left field... [long] from Edward Jason Riedy on 2000-01-17 (www-xml-schema-comments@w3.org from January to March 2000)

From: Edward Jason Riedy <ejr@CS.Berkeley.EDU>
Date: Sun, 16 Jan 2000 22:58:07 -0800
To: "Arnold, Curt" <Curt.Arnold@hyprotech.com>
cc: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-Id: <200001170658.WAA15089@lotus.CS.Berkeley.EDU>
And "Arnold, Curt" writes:
 - Please see my comments in the XML Schema help file available at
 - http://www.software.aeat.com/xml/resources.htm

I'm sorry, but I have no Microsoft software available.  Could you
make this available in a portable format?

 - and previous notes on NaN
 - http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0026.
 - html 

Change the test (0 <= x && x <= Inf) into !(x < 0) && !(x > Inf), and
the range will include NaN unless your compiler mis-optimizes the code.

 - and minAbsoluteValue in
 - http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0024.
 - html.

Which is why I proposed having a root floating-point datatype which
could be used by the 99% of people who just don't care.  They could even
further specify the number of significant digits to describe the stored
data as fully as printf/scanf allows.  I hadn't thought of purely lexical
comparisons on the root type; that's a good idea if comparisons are used
at all.  I'm now of the belief that they shouldn't be.  See the part
below on abstract types v. physical types.

 - All the negative scenarios that I outline in the minAbsoluteValue and in the
 - help file would also occur with the bitsExponent and bitsMantissa facets.

Snipped from that message:
> The schema author naively writes a minAbsoluteValue facet of 1.40239846e-45
> for a datetype since he wants to support applications that use IEEE single
> precision.

Then the naive schema author does not understand the difference between
normalized and denormalized numbers.  Simply supplying a range is not
sufficient to specify any IEEE precision.  The sample points of the
underlying actual space (the real numbers) are not spaced uniformly.  
Proper IEEE arithmetic supports gradual underflow and denormalized 
numbers, which adds a little cluster around zero.  See Dr. Kahan's 
explanation in the reference I cited.

 - If the bitsMantissa type facets were supported, then they should be advisory
 - but not used to try to mimic the floating point system on the sender.  

Please look at the use proposed in SOAP.  They are trying to use XML 
Schemas to describe object method signatures.  My view of XML schemas
was perhaps tainted by that use.  The floating-point support in the
current draft is insufficient for me to use SOAP in my application
domain in a direct, concise manner.  If the SOAP usage doesn't jive 
with the intent behind XML schemas, then...

 - In my perfect would "real" would be primitive, "decimal" would be derived
 - from real (simply excludes the E+nnn fragment) and "integer" would derive
 - from "decimal".

If XML schemas are to be purely lexical, then this sounds great.  The 
current draft leads me to think that the intent is really to specify 
universal types at a very low level, however.  If that is the true 
intent, I must still bug people about my proposal.  I'll still be able 
to use XML schemas and systems based on them (like SOAP), but I'll have 
to revert to the same old tricks.  The probability of convincing other 
people to use a more verbose format with absolutely no clarity 
advantages over older ways is about zero.

Personally, now that you've pointed out the option, I agree that XML
schemas should be lexical.  Leave the interpretation up to the higher
application levels.  Let application domains standardize their 
interpretations.  Any attempt to completely standardize all the types'
interpretations will lead to the same problems being experienced in
Java (http://www.cs/~wkahan/JAVAhurt.pdf).  For some people, these
are critially serious problems.  I admit that my attempt will probably
lead to related problems, but mostly for other people.  ;)

In fact, taking this to the extreme and removing even intervals may
be more reasonable that it first appears.  Any attempt to specify a 
universal `type' system will run into the difference between abstract 
types and physical types.  What if I wanted to specify a XML schema 
type that contained all even integers?  What's the difference between 
this and specifying an interval of integers?  Not really the amount of 
computation involved...  Is it used less?  Well, what if I wanted to
specify that an address is on one particular side of a street?  Ok, so
include a way to specify even or odd numbers.  Now what about primes?

The extreme path of providing nothing beyond lexical guarantees is
fairly reasonable for a standard if that standard intends to be 
applicable to all application areas.  Are intervals and constraining
facets really appropriate in the base schema DTD, or should they be
put into a recommended DTD?  (There is a way to compose DTDs, right?
It'd been a few years since I've worked with SGML.)

 - If you really wanted to conform to the behavior of a specific floating point
 - system (which I would discourage), [...]

As a side note, which is only tangientially related to XML schemas:

In my area, namely scientific and numerical computing, you occasionally
must rely on specific floating point systems.  Not doing so leads to the 
hell of supporting Cray arithmetic.  It really does make a difference.  
You can see the difference in the eigenvalue routines supported in the 
latest version of LAPACK.  A new, optimal routine requires the IEEE logic
behind infinities (iirc, maybe NaNs as well).  Supporting Cray arithmetic
(or others that differ significantly from IEEE) will render the algorithm
highly non-optimal.  I don't know if an optimal algorithm is even possible
with Cray arithmetic; I strongly doubt it.

Some people _need_ to conform to a specific system (or family of systems).

 - 3) If you really feel compelled to enable constraining reals to the ranges
 - of IEEE double and float, then add the following lexicals to the lexical
 - space for real
 - 
 - +Double.MAX_VALUE
[...]

To reiterate:

Constraining reals to the IEEE fp _ranges_ is not sufficient to 
constrain them to the IEEE fp _types_.  Also, you seem to ignore
the difference between +0 and -0 (which is correctly addressed in
the current draft).

 - <!--  finally we restrict this to a specific implementation (which is
 - totally silly)  but if you feel you must -->

It's totally silly until your Ariane 5 explodes...  (Well, ok, that 
was overflow, and I'm primarily pointing at the interesting aspects 
of gradual underflow, but the altitude example leads immediately to 
the Ariane 5 example.)

Jason
Received on Monday, 17 January 2000 01:58:12 UTC