From: <MFC@uk.ibm.com>

Date: Wed, 8 Nov 2000 10:48:31 +0000

To: www-xml-schema-comments@w3.org

Message-ID: <80256991.003B7747.00@d06mta10.portsmouth.uk.ibm.com>

Date: Wed, 8 Nov 2000 10:48:31 +0000

To: www-xml-schema-comments@w3.org

Message-ID: <80256991.003B7747.00@d06mta10.portsmouth.uk.ibm.com>

XML Schema Part 2: Datatypes W3C Candidate Recommendation 24 October 2000 Comment on decimal datatype [section 3.2.5] ============================================ IBM would like to request two small, but critical, changes to the XML Schema decimal datatype description 1. (Essential) Currently the scale of decimal numbers is restricted to be zero or positive. It is requested that this restriction be removed (that is, in 2.4.2.11 the value of scale must be an integer, not a nonNegativeInteger) for the following reasons: a) The current specification allows the representation of very small numbers (for example 1E-100) but does not permit the efficient representation of even moderately large numbers (for example 13 billion, or 13E+9), even though such numbers are common in commerce. Allowing positive exponents (negative scales) will correct the specification so both large and small numbers can be represented equally efficiently. b) The current specification is only suitable for representing limited range, fixed point, decimal numbers. Removing the restriction will make the representation general, and allow practical floating point operations on XML Schema decimal numbers. c) Removing the restriction will make conversions between the floating binary datatypes and the decimal datatype more efficient and less likely to raise exceptions. For example, a binary floating point number approximates a number such as 1E+100 in a few bytes; when this is converted to XML Schema decimal it would require 101 characters, which could exceed implementation limits. However, an exact representation requires only six characters (with only one digit of precision being needed, which would be within the capabilities of any implementation). 2. (Highly desirable) The lexical representation of decimal numbers (3.2.5.1) is currently restricted to be a subset of that of binary numbers. It is proposed that the representation of decimal numbers be made the same as for binary numbers (3.2.3.1 for float, and 3.2.4.1 for double), for the following reasons: a) The current proposal has different lexical rules for binary and decimal numbers. This distinction is an unnecessary complication which imposes a confusing and artificial syntax grammar; a single syntax will simplify the Schema and reduce implementation costs. b) At present, if a number is naturally expressed as a number with an exponent (for example, 1.3 billion might be written as 1.3E+9) the Schema requires that it be expanded to a plain integer form (1300000000) which is less efficient, hard to read, and error-prone. This transformation also loses any conventional indication of significance. c) Similarly, small numbers (such as 1E-100) can be efficiently represented in the schema, yet in order to be visualized have to be expanded with leading zeros (one hundred, in this case). This is inefficient, hard to read, and error-prone. d) It is usual to define a conversion between binary and decimal numbers as a conversion from binary to string form, and then from that string form to decimal (or vice versa). If decimal numbers cannot be expressed using exponential form, these 'round trips' are impossible in one direction and impractical in the other. As specified, one lexical syntax of numbers defined by the XML Schema is incompatible with another. e) At present, exponential notation can only be used for binary floating point numbers. Since these can only approximate many decimal fractions, the Schema has no mechanism for handling numbers in exponential notation precisely, other than as character strings. Supporting information: 1. The Java class library has a BigDecimal class which has similar restrictions to the proposed XML Schema. This has proved to be so disadvantageous that numerous companies have supported IBM's request to remove the restriction. For details, see the Java Specification Request JSR-13, at http://www2.hursley.ibm.com/decimalj/jsr-decimal.html which lists a representative selection of those companies, and expands on the rationale. This JSR has been approved by Sun and is expected to result in an improved BigDecimal class in due course. 2. The W3C XForms working group would very much welcome full support for decimal datatypes in XML Schema. Numbers are entered in decimal, displayed in decimal, and increasingly are stored in decimal. Many users are confused when calculations (based upon binary arithmetic) deliver different results from the way they were taught at school. The performance degradation incurred by using decimal instead of binary arithmetic is expected to be insignificant for forms-based applications, where much processing takes place in the client. 3. Programming languages and their libraries increasingly support floating point or wide-range decimal numbers. These include Java (see above), COBOL, the Rexx family, C#, application libraries for C, C++, and Ada, and many scripting languages. 4. Decimal data is predominant in commercial databases, and arithmetic on these data often requires wide ranges, especially for large numbers. One survey (partially reported in IBM Technical Report TR 03.413 by A. Tsang & M. Olschanowsky) analyzed the column datatypes of databases owned by 51 major organizations. These databases covered a wide range of applications, including Airline systems, Banking, Financial Analysis, Insurance, Inventory control, Management reporting, Marketing services, Order entry, Order processing, Pharmaceutical applications, and Retail sales. Of these columns, 41.8% contained identifiably numeric data; in these, the breakdown by datatype was: Type | Columns | percent ----------+----------+--------- Decimal | 251038 | 55.0 SmallInt | 120464 | 26.4 Integer | 78842 | 17.3 Float | 6180 | 1.4 Since both SmallInt and Integer could have been represented by Decimal type numbers without loss, 98.6% of the numeric columns in the sample could have used a decimal representation. 5. For additional information on the direction and significance of decimal data and arithmetic, see http://www2.hursley.ibm.com/decimal - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Mike Cowlishaw FREng, IBM Fellow mailto:mfc@uk.ibm.com -- http://www2.hursley.ibm.com/mfcsumm.htmReceived on Wednesday, 8 November 2000 05:50:18 UTC

*
This archive was generated by hypermail 2.3.1
: Friday, 13 July 2018 09:02:52 UTC
*