- From: <MFC@uk.ibm.com>
- Date: Wed, 13 Dec 2000 11:40:05 +0000
- To: www-xml-schema-comments@w3.org
The proposed canonical representation would introduce a serious problem into the definition for decimal. Recall that the value space of decimal is the set of the values i * 10^-n, (where i and n are integers, * means multiply, ^ means raise to power, and currently n must be non-negative). In todays usage, in both databases and in programming languages, decimal numbers are almost always represented in precisely that manner, that is an integer (i) and a scale (n). The scale may be implicit or explicit, depending on the language or database. As defined, and in actual representations, therefore, the value space of decimal can have multiple but distinct values which are numerically equal. This is an important attribute of the representation, especially useful in financial and enginering contexts. For example, the two values: 1 * 10^-0 100 * 10^-2 are distinct (the first is normally written as '1', the second as '1.00'). They have different integer and scale parts. Now, the canonical lexical representation should be a set of literals such that there is a one-to-one mapping between literals in the canonical lexical representation and values in the value space. The proposed new wording does not satisfy this definition; it would show both the distinct values above as the same literal ('1.0'), whereas in fact the only value that can correctly be shown in this way is: 10 * 10^-1 In other words, the proposed canonical representation would lose information; if a decimal number (such as the second example, 100 with a scale of 2) were encoded using the proposed canonical representation then its original form could not be recovered as there is not a one-to-one mapping. - - - - - One unambiguous canonical representation would be to use an exponential notation matching the value space (that is, for the three examples above: 1, 100E-2, and 10E-1). However, the current draft prefers plain numbers (as do most people and programming languages when the range is small). For plain numbers, the words used are typically something like (using XML-schema naming): The absolute value of the integer (i) is first converted to a string in base ten using the characters '0' through '9' with no leading zeros (except if its value is zero, in which case a single '0' character is used). If the scale (n) is zero then no decimal point is added. Otherwise (the scale is positive), a decimal point will be inserted into the converted integer with the value of the scale specifying the number of characters to the right of the decimal point. '0' characters are added, to the left of the converted integer, if necessary to allow this insertion. If no character precedes the decimal point after the insertion then a conventional '0' character is prefixed. Finally, if the integer (i) was less than 0 then the entire string is prefixed by a minus sign character. This definition preserves the one-to-one mapping and also meets the requirements for lexical space (section 2.3), notably that the literals should correspond to those found in common programming languages and libraries. I would suggest that the canonical representation should follow this definition (it's also perhaps more understandable if a positive description of the tighter definition is given, rather than trying to derive a tight definition by prohibiting aspects of a vague definition). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Mike Cowlishaw, IBM Fellow mailto:mfc@uk.ibm.com -- http://www2.hursley.ibm.com/decimal
Received on Wednesday, 13 December 2000 07:20:16 UTC