Re: Floating point proposal from left field... [long] from petsa@us.ibm.com on 2000-01-13 (www-xml-schema-comments@w3.org from January to March 2000)

From: <petsa@us.ibm.com>
Date: Thu, 13 Jan 2000 16:18:21 -0500
To: "Arnold, Curt" <Curt.Arnold@hyprotech.com>
cc: "'ejr@CS.Berkeley.EDU'" <ejr@CS.Berkeley.EDU>, "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-ID: <85256865.00750636.00@D51MTA03.pok.ibm.com>
Curt:
We have changed the spec substantially in this area.  The "real"
datatype is gone.  There are 2 new primitive datatypes corresponding
to IEEE float and double.  I think you will like this much better.
The 12/17 public draft includes these changes.

All the best, Ashok


"Arnold, Curt" <Curt.Arnold@hyprotech.com>@w3.org on 01/13/2000 02:55:53 PM

Sent by:  www-xml-schema-comments-request@w3.org


To:   "'ejr@CS.Berkeley.EDU'" <ejr@CS.Berkeley.EDU>
cc:   "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Subject:  Re: Floating point proposal from left field... [long]



Please see my comments in the XML Schema help file available at
http://www.software.aeat.com/xml/resources.htm and previous notes on NaN
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0026
.
html and minAbsoluteValue in
http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0024
.
html.

I have been lobbying hard for the removal of any facet that would require
an
interpreting system to try to mimic the floating point system of the
sender.
All the negative scenarios that I outline in the minAbsoluteValue and in
the
help file would also occur with the bitsExponent and bitsMantissa facets.
If the bitsMantissa type facets were supported, then they should be
advisory
but not used to try to mimic the floating point system on the sender.  The
Schema space should only be concerned about lexical validation, details
about specific implementation types and behavior on overflow and underflow
should only be introduced in a type-aware DOM.

In my perfect would "real" would be primitive, "decimal" would be derived
from real (simply excludes the E+nnn fragment) and "integer" would derive
from "decimal".  Each of these would hint to a type-aware DOM that they
should be bound to a specific datatype, however I suggested a provision for
an explicit hint "appType" attribute to suggest what data type a type-aware
DOM should use.

In the help file and on the Xerces-dev list, I've been lobbying that the
comparisions behind minInclusive, etc, should be done lexically.  This
should be faster than conversion to native floating point and should be
consistent on all platforms where use of native floating point could cause
inconsistent validation results due to the rounding issues that you were
discribing.

If you really wanted to conform to the behavior of a specific floating
point
system (which I would discourage), you can do that with existing facets
(okay, you have to add some of the logical facets that I suggested in the
help file).  There are three ways that mimicing a particular floating point
representation would modify constraints on a floating point number

1)  Imposition of a maximum magnitude.

If I say that a "real" must be able to be able to convert to a IEEE single
precision number without overflow, that implies a min and max bound which
could be enforce using lexical comparision.

<datatype name="float" source="real">
     <minInclusive>-1.xxxxxxxxxxxE38</minInclusive>
     <maxInclusive>1.xxxxxxxxxxxE38</minInclusive>

2) Loss of precision

A lexical real could express more precision than binary floating point
types, however except at boundaries this should not be significant.  If the
source knows a value to more precision than I do, then let it express all
it
knows and I'll keep all I can handle.

3) Loss of precision in bounds checks

If I wanted to prevent divide-by-zeros, I might put in a <minExclusive
value="0"/>.  However, with a lexical comparision that would allow a value
like "1e-500" that would cause a divide by zero when converted to a IEEE
single or double precision.  However, if I placed by bound at the minimum
non-zero representable in whatever target binary system that I desired.

My recommendations would be

1) Reestablish the real datatype as an unlimited precision and range
floating point number where bound comparisions are done lexically.

2) Derive decimal from real and integer from decimal.

3) If you really feel compelled to enable constraining reals to the ranges
of IEEE double and float, then add the following lexicals to the lexical
space for real

+Double.MAX_VALUE
+Double.MIN_VALUE
-Double.MAX_VALUE
-Double.MIN_VALUE
+Float.MAX_VALUE
+Float.MIN_VALUE
-Float.MAX_VALUE
-Float.MIN_VALUE

The lexicals only help.  If you wanted to constrain to another type, all
you
would have to do is to type out the lexical representation of those values
in your bounds.

Which are equivalent as the full precision lexical representations of the
IEEE boundaries

Then to restrict a datatype to the range of double (and still allow
Infinity
and NaN), you would do something like

<datatype name="double" source="real">
     <!-- note: might be better to replace not with nor and nand  -->
     <not>
          <!-- this clause would only be true
          <or>
               <and>
                    <minExclusive value="-Infinity"/>
                    <maxExclusive value="-Double.MAX_VALUE">
               </and>
               <and>
                    <minExclusive value="+Double.MAX_VALUE">
                    <maxExclusive value="+Infinity"/>
               </and>
          </or>
     </not>
</datatype>

If you wanted to restrict a value to not be greater than zero even after
rounding (and allowing +NaN), it could be enforced by

<datatype name="positive-double" source="real">
     <not>
          <maxExclusive value="+Double.MIN_VALUE"/>
     </not>
</datatype>


I would would really prefer you not make double and float a generated class
in schema for schema's, but you could put in an appendix that shows how you
can constrain a datatype to a particular application datatype.  One reason
that I'd avoid putting implementation specific types that low in the
heirarchy is that I envision that you will want to use the inheritance
hierarchy to classify the interpretation of the type not its machine
implementation.  For example,

<datatype name="length" source="real">
     <annotation><info type="description">Length in
meters.</info></annotation>
</datatype>

<datatype name="altitude" source="length">
     <annotation><info type="description">Altitude in
meters.</info></annotation>
</datatype>

<datatype name="altitudeRelativeToGround" source="altitude">
     <annotation><info type="description">Altitude in meters relative to
the ground.</info></annotation>
     <minInclusive value="0"/>
</datatype>

<!--  finally we restrict this to a specific implementation (which is
totally silly)  but if you feel you must -->
<datatype name="floatAltitudeRelativeToGround"
source="altitudeRelativeToGround"/>
     <not>
          <or>
          <and>
               <maxExclusive value="+Infinity"/>
               <minExclusive value="+Float.MAX_VALUE"/>
          </and>
          <and>
               <maxExclusive value="-Float.MAX_VALUE"/>
               <minExclusive value="+Infinity"/>
          </and>
          </or>
     </not>
</datatype>
Received on Thursday, 13 January 2000 16:20:49 UTC