- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Fri, 4 Jul 2008 09:10:52 -0600
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>, Dave Peterson <davep@iit.edu>, www-xml-schema-comments@w3.org
On 2 Jul 2008, at 23:27 , Alan Ruttenberg wrote: > On Jul 2, 2008, at 5:25 PM, Dave Peterson wrote: > >> ... But I think what you probably >> want is to derive float and double from decimal. > ... > >> The problem with that is that such a derivation would violate a >> fundamental property that >> we wanted derivation to have: If a value is removed from the value >> space during a derivation, that automatically removes its lexical >> representations from the lexical space. However, float and double >> require that strings that exactly represent a decimal value not in >> the float or double value space be mapped to the nearest value that >> is in the lexical space. >> >> Rather than remove that fundamental property of derivation, we >> decided >> to leave float and double as separate primitives. > > Perhaps this is a stupid question, but why is this a fundamental > property of derivation? One generally thinks of types in terms of > subset relations. Yes, indeed. In XSD, datatypes can be viewed extensionally as a mapping from lexical space to value space. (In fact, "datatype" is the term used in the XSD spec precisely for the extensional view, and "simple type" or "simple type definition" for the intensional view.) For the primitive and ordinary datatypes (i.e. for all datatypes except the special datatypes anyType, anySimpleType, and anyAtomicType) the lexical space is the range and the value space the domain of the lexical mapping relation. Restriction involves taking a subset of the base type; lexical facets specify a subset of the lexical space, and cause corresponding subsets of the mapping and value space to be generated, while value facets specify a subset of the value space and cause corresponding subsets of the mapping and lexical space to be generated. The most economical way to think about it, although not the most economical way to describe it so that people can actually use the derivation mechanisms, is to consider all facets as filtering the mapping relation (m' = l <: m for a lexical facet specifying a subset l of the lexical space, or m' = m :> v for a value facet specifying a subset v of the value space, if the operators <: and :> mean anything to you). Since the lexical mappings of float and double map literals to the nearest value, while the lexical mappings for decimal and the real type present in early drafts map literals to an exact value, neither mapping appears to be plausible as a subset of the other. No doubt other stories could be devised about how the lexical mapping of a restriction relates to the lexical mapping of the base type. But, as you say, the story that says "it's a subset" is simple and appeals to fundamental intuitions about restrictions. So XSD has chosen that story. There might be some way to tell that story and still get all the numeric datatypes into a single derivation hierarchy, but I don't know how to do that. Another issue that arose in early drafts which attempted to derive float and double from a real-number datatype: the facets one must define in order to describe the relation seem arbitrary and ad hoc, lacking in any mathematical motivation, nearly incomprehensible in fact, unless one asks what mathematical properties one must exploit in order to represent approximations of real numbers in a binary floating-point format designed for convenient representation inside electronic devices. It seemed simpler and more straightforward to say that float and double are intended to match IEEE numbers than to say that they are a particular subset of the reals defined by application of particular facets. Then, too, deriving float and double by specifying 2 as a base and particular sizes for exponent and mantissa seems to suggest that the same facets might be given different values, so as to make it possible for schema authors to define a set of numbers which correspond to a base-11 number with 17 digits of mantissa and 16 digits of exponent. (Or substitute any positive integers of your choice for 17 and 16 here, and any integer greater than 1 for 11.) The three designs available seemed to boil down to: - abstract numeric type with facets to allow definition of floating- and fixed-point numbers with arbitrary bases and capacities -- aka Implementors' Nightmare - abstract numeric type with facets for defining IEEE float and double, which however schema authors are forbidden to use, so the generality of the facet mechanism is purely illusory: for all intents and purposes, the IEEE types are defined by magic, and the 'facets' are a fig leaf - primitives for the types actually to be supported, with provisions for type coercion in the languages which use them (as, for example, in the XPath Functions and Operators spec) None of these seem to be so beautiful and obviously right that everyone would greet it as the one true solution, but on the whole I think the third approach, taken by XSD 1.0, is more honest and straightforward, at least for the problems of validation that XSD must solve. As the XSD spec says, the mapping from XSD types to types in a programming language or other system is not fixed, and there is no requirement that XSD primitives map to primitives in the other system, or vice versa. > 2) That you inadvertently make the comparison emphasizes the point > that floats and decimals *are* comparable. When I said above that I > worry that the theory is not coherent, it is the absence of any > explanation within the specification of how such a comparison could > be made that forms part of such a concern. Personally, I thought the spec was fairly clear that the disjointness of the primitives is a given for purposes of XSD, and is not intended as a constraint on other systems, which will of course wish to compare values across primitive types. > ps. Please consider this a formal comment on the specification. If > desired I can submit it to the bug tracker. Yes, please do. When you do, it would be helpful if you clarified whether the gist of your comment is (a) please reorganize your type hierarchy for numerics from the ground up (b) please say more explicitly whether it makes sense for applications and systems not performing XSD schema-validity assessment to compare values with different primitive types (c) multiple primitive numerics? blecch! yuck! Speaking only for myself, I think (b) or something similar might be plausible, but (a) is not likely to happen in a point release (or for that matter in any spec claiming to define a version of XSD) and (c) will elicit either a shrug or a sympathetic sigh, but probably not a change to the spec. As Michael Kay has said in this thread, there are a lot of interesting issues here, and no one right answer. --C. M. Sperberg-McQueen World Wide Web Consortium
Received on Friday, 4 July 2008 15:11:30 UTC