RE: Primitive Datatypes of XML Schema (boolean, float, double) from Arnold, Curt on 2000-02-14 (www-xml-schema-comments@w3.org from January to March 2000)

From: Arnold, Curt <Curt.Arnold@hyprotech.com>
Date: Mon, 14 Feb 2000 12:00:11 -0700
To: "'Dan Connolly'" <connolly@w3.org>
Cc: "'www-xml-schema-comments@w3.org'" <www-xml-schema-comments@w3.org>
Message-ID: <00E567D938B9D311ACEC00A0C9B468730C7583@THOR>
Using decimal as a fallback position is possible, however loss of the E+/- idiom does result in a loss of legibility when dealing with large or small numbers.  It is easier for a human to comprehend,
compare or detect an error when a number is represented as 6.023E23 instead of 602300000000000000000000.  If you are using long double's, you could be stuck with using almost 5000 characters to
represent 19 digits of precision.

When I was using "real", I was using it in the sense of the earlier drafts.  That is basically decimal with the E+/- idiom.

I agree that trying to support a wide variety of floating point platforms is not desireable.  Making all platforms try to understand all other numeric platforms is definitely more challenging than to
make all platform try to understand just one numeric format.  It would be likely that any non-IEEE system would be exchanging information with an IEEE system, so it is easier to force the non-IEEE
system to mimic IEEE semantics.

I don't have a problem with double and float being defined datatypes and I agree the ranges and rounding behavior can't be precisely replicated lexically.  I would just like this to be in addition to
an "real" datatype.

However, when the schema author does not wish to constrain the schema to IEEE float or double ranges and does not want to take the performance hit to precisely replicate their rounding behavior on
evaluating min/max constraints, then he shouldn't be forced to take that hit.  

I need to build some benchmarks on the relative speed of lexical value comparision vs conversion to double/float and comparision, but conversion to double/float is typically a very expensive operation
and could severely impact the performance of applications dealing with a lot of numeric data.



-----Original Message-----
From: Dan Connolly [mailto:connolly@w3.org]
Sent: Monday, February 14, 2000 12:10 PM
To: www-xml-schema-comments@w3.org; Arnold, Curt
Cc: Mark Reinhold
Subject: Re: Primitive Datatypes of XML Schema (boolean, float, double)


[oops... I accidently sent an unfinished version of this message...
please disregard my message of
Mon, 14 Feb 2000 12:06:27 -0600]

I read this and some of your earlier comments on this topic; e.g.:

>> Here is the pocket version of my feelings toward the 12/17 Datatypes Draft
>> 
>> 1) "real" (a unlimited range and precision floating point) must come back.
-- Sat, 15 Jan 2000 11:32:21 -0600
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0055.html

I'm not sure what to make of that. The decimal type provides
unlimited range and precision, so I'm not sure why you
say it 'must come back'. It never went anywhere.

I've never heard the term 'floating point'
used to describe a datatype with unlimited range/precision, so
I don't know what you mean for those words to contribute.

As to 'real'... the real numbers don't have a convenient lexical
representation
(e.g. sqrt(2) and pi are infinitely long non-repeating numerals).
'real' was (unfortuntately!) popularized as a name for floating-point
types in FORTRAN, but those were always fixed range/precision. What
did you mean by 'real'?

In your message of Thu, 13 Jan 2000 12:55:53 -0700
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0050.html
you write:

>> In my perfect would "real" would be primitive, "decimal" would be derived
>> from real (simply excludes the E+nnn fragment) and "integer" would derive
>> from "decimal".

Well... integer does derive from decimal. We don't have an
arbitrary-precision
datatype with the E+nnn idiom in its lexical representation; is
that what you meant by "real"? If you can show, by way of use cases,
that it's important/essential to be able to use the E+nnn idiom when
writing decimals, I suppose we could consider it. But writing lots
of zeros doesn't seem to be a critical problem, as far as I can see.

Your message goes on to say...

>> If you really wanted to conform to the behavior of a specific floating point
>> system (which I would discourage),

it was considered essential by the WG. c.f. scenarios
"3.Supervisory control and data acquisition. "
and
"6.Open and uniform transfer of data between applications, including
databases "
in our requirements document
http://www.w3.org/TR/1999/NOTE-xml-schema-req-19990215


>>  you can do that with existing facets

no, you cannot specify IEEE float/double semantics in terms of
arbitrary precision decimals (or rationals in any radix).
Floating point comparison (and other operations) are just
not the same as rational comparisons. I'll try to dig up details on
this...

> The elimination of real and the introduction of float and double types in the last draft do several very negative things:
> 
> 1) they make it very, very difficult to write applications that generate valid XML on platforms whose native floating points are not IEEE when there are minExclusive or maxExclusive constraints.
> Basically, you have to try to mimic the rounding characteristics of IEEE to make sure that a value that is less than or greater than the bound on your platform is still less than or greater than
after
> rounding on IEEE.

We considered more generalized designs, including the HTTP-NG floating
point design:

======
excerpt from
http://www.w3.org/TR/1998/WD-HTTP-NG-architecture-19980710/

3.5.2. Floating-point Types 

Floating-point types are specified with eight parameters: 

     the size in bits of the significand, 
     the base of the exponent, 
     the maximum exponent value, 
     the minimum exponent value, 
     whether they support a distinguished value for `Not-A-Number', 
     whether they support a distinguished value for `Infinity', 
     whether denormalized values are allowed, and 
     whether the zero value is signed (whether they can have both +0 and
-0). 
======

but we decided that the burden on implementors outweighed the benefits.

We decided the cost of implementing the IEEE semantics on platforms
where you can't count on the infrastructure to provide IEEE semantics
was acceptable. We expect it's more cost-effective for everybody
to converge on one floating point platform than to proliferate
a variety of them.

> 
> 2) it makes it very difficult to write validating parsers in languages that do not impose IEEE numerics (i.e. C++) than will validate consistently on different platforms.  Basically, it means that
you
> have to write your own atof() and floating point comparision routines (on top of long or quadword types) since you cannot depend on the native float and double to behave consistently with IEEE.
> 
> 3) it requires conversion from text to a float/double type for validation.  With the abstract real type type, you could do constraint checking lexically which should be substantially faster than
> conversion to a floating point and then comparision.

What do you mean by an 'abstract real type'? The way it was specified
in earlier drafts didn't make sense to the WG, upon examination.

>  Numeric conversion can very easily dwarf both parsing and DOM creation in time.  I've been meaning to develop and publish some benchmarks for
> this.

If you don't expect the receiver to convert to an IEEE floating point
representation, I suggest you use the decimal type (or some type
derived from it) rather than float/double.


> 4) It doesn't support more precise numeric representations.
> 
> There have been several threads on floating point issues on the schema comments list, the last significant thread was http://lists.w3.org/Archives/Public/www-xml-schema-comments/2000JanMar/0043.html
> 
> The following messages consider the deleted minAbsoluteValue facet which was a the first move toward binding real to a specific implementation.
> 
> http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999OctDec/0024.html
> http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999JulSep/0052.html


-- 
Dan Connolly, W3C
http://www.w3.org/People/Connolly/
Received on Monday, 14 February 2000 14:02:55 UTC