- From: <MFC@uk.ibm.com>
- Date: Wed, 8 Nov 2000 10:48:31 +0000
- To: www-xml-schema-comments@w3.org
XML Schema Part 2: Datatypes
W3C Candidate Recommendation 24 October 2000
Comment on decimal datatype [section 3.2.5]
============================================
IBM would like to request two small, but critical, changes to the XML
Schema decimal datatype description
1. (Essential)
Currently the scale of decimal numbers is restricted to be zero or
positive. It is requested that this restriction be removed (that is,
in 2.4.2.11 the value of scale must be an integer, not a
nonNegativeInteger) for the following reasons:
a) The current specification allows the representation of very small
numbers (for example 1E-100) but does not permit the efficient
representation of even moderately large numbers (for example 13
billion, or 13E+9), even though such numbers are common in
commerce. Allowing positive exponents (negative scales) will
correct the specification so both large and small numbers can be
represented equally efficiently.
b) The current specification is only suitable for representing
limited range, fixed point, decimal numbers. Removing the
restriction will make the representation general, and allow
practical floating point operations on XML Schema decimal numbers.
c) Removing the restriction will make conversions between the floating
binary datatypes and the decimal datatype more efficient and less
likely to raise exceptions. For example, a binary floating point
number approximates a number such as 1E+100 in a few bytes; when
this is converted to XML Schema decimal it would require 101
characters, which could exceed implementation limits. However,
an exact representation requires only six characters (with only
one digit of precision being needed, which would be within the
capabilities of any implementation).
2. (Highly desirable)
The lexical representation of decimal numbers (3.2.5.1) is currently
restricted to be a subset of that of binary numbers. It is proposed
that the representation of decimal numbers be made the same as for
binary numbers (3.2.3.1 for float, and 3.2.4.1 for double), for the
following reasons:
a) The current proposal has different lexical rules for binary and
decimal numbers. This distinction is an unnecessary complication
which imposes a confusing and artificial syntax grammar; a single
syntax will simplify the Schema and reduce implementation costs.
b) At present, if a number is naturally expressed as a number with an
exponent (for example, 1.3 billion might be written as 1.3E+9) the
Schema requires that it be expanded to a plain integer form
(1300000000) which is less efficient, hard to read, and
error-prone. This transformation also loses any conventional
indication of significance.
c) Similarly, small numbers (such as 1E-100) can be efficiently
represented in the schema, yet in order to be visualized have to
be expanded with leading zeros (one hundred, in this case). This
is inefficient, hard to read, and error-prone.
d) It is usual to define a conversion between binary and decimal
numbers as a conversion from binary to string form, and then from
that string form to decimal (or vice versa). If decimal numbers
cannot be expressed using exponential form, these 'round trips'
are impossible in one direction and impractical in the other.
As specified, one lexical syntax of numbers defined by the XML
Schema is incompatible with another.
e) At present, exponential notation can only be used for binary
floating point numbers. Since these can only approximate many
decimal fractions, the Schema has no mechanism for handling
numbers in exponential notation precisely, other than as character
strings.
Supporting information:
1. The Java class library has a BigDecimal class which has similar
restrictions to the proposed XML Schema. This has proved to be so
disadvantageous that numerous companies have supported IBM's request
to remove the restriction. For details, see the Java Specification
Request JSR-13, at
http://www2.hursley.ibm.com/decimalj/jsr-decimal.html
which lists a representative selection of those companies, and
expands on the rationale. This JSR has been approved by Sun and is
expected to result in an improved BigDecimal class in due course.
2. The W3C XForms working group would very much welcome full support for
decimal datatypes in XML Schema. Numbers are entered in decimal,
displayed in decimal, and increasingly are stored in decimal. Many
users are confused when calculations (based upon binary arithmetic)
deliver different results from the way they were taught at school.
The performance degradation incurred by using decimal instead of
binary arithmetic is expected to be insignificant for forms-based
applications, where much processing takes place in the client.
3. Programming languages and their libraries increasingly support
floating point or wide-range decimal numbers. These include Java
(see above), COBOL, the Rexx family, C#, application libraries for C,
C++, and Ada, and many scripting languages.
4. Decimal data is predominant in commercial databases, and arithmetic
on these data often requires wide ranges, especially for large
numbers. One survey (partially reported in IBM Technical Report TR
03.413 by A. Tsang & M. Olschanowsky) analyzed the column datatypes
of databases owned by 51 major organizations. These databases
covered a wide range of applications, including Airline systems,
Banking, Financial Analysis, Insurance, Inventory control, Management
reporting, Marketing services, Order entry, Order processing,
Pharmaceutical applications, and Retail sales.
Of these columns, 41.8% contained identifiably numeric data; in
these, the breakdown by datatype was:
Type | Columns | percent
----------+----------+---------
Decimal | 251038 | 55.0
SmallInt | 120464 | 26.4
Integer | 78842 | 17.3
Float | 6180 | 1.4
Since both SmallInt and Integer could have been represented by
Decimal type numbers without loss, 98.6% of the numeric columns in
the sample could have used a decimal representation.
5. For additional information on the direction and significance of
decimal data and arithmetic, see http://www2.hursley.ibm.com/decimal
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Mike Cowlishaw FREng, IBM Fellow
mailto:mfc@uk.ibm.com -- http://www2.hursley.ibm.com/mfcsumm.htm
Received on Wednesday, 8 November 2000 05:50:18 UTC