XML schema -- numeric values

Folks,

I would like to argue for a different approach to defining numeric 
values.  Specificaly, rather than defining three primitive numeric types 
based on IEEE(?) 'float', 'double' and decimal character sequences, I argue 
for defining a _single_ primitive numeric type, and defining the existing 
types in terms of that.

I think the primitive numeric type should be the rational numbers;  i.e. 
the set of numbers that can be expressed as n/m, where n is an integer, and 
m is an integer greater than zero.  The textual representation could be 
"n/m" or "(n,m)" using decimal radix representation.  A canonical 
representation would have numerator and denominator reduced to lowest 
terms, or m=1 if n=0, and all leading zeros suppressed.

The current numeric types would retain their current syntax and semantics, 
except that their values would be defined by a projection from the space of 
rational values.  (I am getting this comment out in a bit of a rush, so I 
apologize for not being more specific at this time.)

Why change?
-----------

I have come to this issue from my work in the CC/PP working group, which is 
defining a format for describing client capabilities and preferences, using 
RDF as a base.  As part of this work, we are looking at other systems that 
perform similar functions with a view to designing a system that does not 
have semantics gratuitously incompatible with those systems.

One such effort is the IETF CONNEG format [RFC 2533] (of which I happen to 
be an editor).  This uses rational number values.  These have proven to be 
especially useful for handling millimetre values expressed in inches, 
etc.  (e.g. RFC 2531 and <draft-ietf-fax-T30-mapping-xx.txt>.)

The plan (as I understand it) is that CC/PP, through RDF, will inherit the 
XML schema system for (at least) simple types.

I understand that rational numbers can be represented in XML schema by 
using a pair of integers, but that such an approach would not provide 
comparison (<=, >=, etc.) for such values.

I happen to believe that rational numbers are the natural underlying set 
for numbers processed by a computer system:  all such numbers are some 
subset of the rational numbers (integers, floats, doubles, etc.)

So I would suggest the following reasons for a different approach to 
numeric values, using rational values as the basis for all numbers, with 
value restrictions for decimal, float, etc.

(i) representation and comparison of arbitrary rational values is made 
possible -- e.g.  (1/3 < 2/5).

(ii) a fundamentally simpler type system, with a single underlying 
primitive type where now there are three.

(iii) easier to define functions that work across all numeric types.

(iv) a cleaner model for numbers in RDF statements (see below).

(v) as a data type, rational numbers are better behaved and relatively 
well-understood;  e.g. '+', '-', '*' are closed and invertible functions 
over rationals;  this isn't true of 'float', 'double'.  With the exception 
of 'x/0', '/' is closed over rationals.  This isn't true of 'decimal'.  Etc.

I believe that the definition of the value set should be separated from its 
lexical representation in a character string.  I think this should apply 
_even_ in the XML schema environment where all attribute values must be 
represented as Unicode character sequences.

I understand it is intended that future RDF developments will inherit the 
XML schema type system, and also that the current RDF approach of having 
resources and literals as distinct values is under consideration.  I don't 
think that it is appropriate that the RDF model, as a generic metadata 
representation format, should limit itself to dealing in values that are 
literal strings.  In particular, I don't think it is right that RDF should 
end up using numeric types that are based on specific textual 
representations rather than a well-understood framework of values and 
associated operators (such as the rationals).

#g
------------
Graham Klyne
(GK@ACM.ORG)

Received on Saturday, 20 May 2000 13:10:49 UTC