- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 21 May 1997 13:00:46 -0700
- To: w3c-sgml-wg@w3.org
I have revised my paper on data typing at [1] http://www.textuality.com/xml/typing.html to take into account some of the issues raised here. The following changes have been made: 1. scientific notation is supported for FLOAT data types 2. the DATE/TIME/TIMESTAMP types conform to ISO 8601:1988 (thanks to Peter M-R for digging this up) 3. rather than overload everything on XML-SQLSIZE, there are several independent data type parameterization attributes 4. used the nice explanatory technique from [2] below of providing element-like declarations for each type, to illustrate the use of the various parameterizing attributes 5. more examples Another note that is relevant is [2] Jon Bosak's legible version of Jon Paoli's posting, dated 16/05/97, embodying a Microsoft proposal, entitled "SD3 - Data Types [fmt]" The following issues need to be decided by the ERB - most of them have been sufficiently discussed I think, but it can't hurt to lay them out. DT-1. Should we propose a data typing mechanism as part of the XML work? Pro: the whole SGML world has been screaming for this for years, and XML needs it even more. Con: less is more - the world, despite its screams, has limped along without it all this time. Also we have other important work to do. DT-2. Should the data typing mechanism be a separate paper in the WD-xml series rather than part of XML-lang? Pro: Keep XML-lang simple. SGML (& maybe HTML) can use it too. Con: The usefulness of XML-lang may be impaired if it doesn't have the typing guaranteed to be built-in. DT-3. Should the data typing be a universal/extensible regexp-based thing, (as proposed by Gavin Nicol and others) rather than a simple subset of of the SQL types as proposed in [1]? Pro: extensibility is good - the usages of SGML and XML are unpredictable; SQL types were designed for boring COBOL applications. Con: we already have extensibility with SGML extended facilities lextypes; the SQL types are proven in commercial practice, and are presented at the right level for the people who build real applications. DT-4. Should data typing be provided for attribute values, not just content as proposed in [1]? Pro: the minimal typing provided by SGML is for attributes; they are typically a good place to put atomic values any way. Con: for element content, you can do it with just one or two typing attributes - if you want to do attributes, the mapping machinery gets bigger and more complicated. Once again, less is more - if we have it for elements, do we really need it for attributes? DT-5. Should data typing for the content of elements be applicable to mixed content as proposed in [2], as well as pure character data content as proposed in [1]? Pro: There's no real architectural problem with doing mixed content, you just pretend the child elements aren't there. Con: With mixed content you can get into whitespace problems; also, it "feels like" the data typing should apply to atomic items, and mixed content doesn't "feel" atomic. DT-6. Should the primary attribute name be XML-TYPE as proposed in [2] rather than XML-SQLTYPE as proposed in [1]? Pro: Shorter is better; having all these attributes with SQL in front of them makes them much less readable. Con: These are not pulled out of the air, but rely heavily on SQL; it may be desirable to have other typing mechanisms introduced later; they would mostly be predeclared in internal subsets; terseness is not supposed to be a big deal per our design goals. DT-7. Should the typing proposal include data value ranges as proposed in [1] (omitted in [2])? Pro: Databases use them; this will make it pretty easy to construct an input/authoring system that will produce examples that are loadable into a database with somewhat higher confidence. Con: This really amounts to validation rather than typing, which is another domain; the approach in [1] is violently incompatible with SQL, which does it with real SQL queries, that can also be extended to involve the values of other queries - in fact, anything that an SQL query can do. DT-8. Should the CHAR datatype use both SIZE and MAXSIZE parameters as in [2] (in [1], it only had SIZE and was conceived of as a fixed-size field)? Pro/Con: I don't understand [2] in this regard - I was following SQL in assuming that CHAR fields are fixed-size and thus need only one parm. DT-9. Should the numeric data types DECIMAL, INTEGER, and FLOAT use parameters designed to control the maximum datum size as in [2]'s XML-DIGITS, XML-DIGITS-R (both for DECIMAL and INTEGER) and XML-BITS (for FLOAT) ([1] provides only SCALE, for DECIMAL, and nothing else). Question: what is DIGITS-R, anyhow? It doesn't show up in the example and is not otherwise explained. Pro: Better control over the storage requirements, and another useful validation step. Con: More machinery - perhaps beyond XML's complexity cost/benefit ratio. DT-10. For the FLOAT datatype, should we prescribe that these internally correspond to IEEE floats (as proposed by someone I forget who)? Pro: This would make comparison and sorting deterministic, and ensure that the string representation corresponds to a particular bit pattern. Con: Overspecification - comparison/sorting of the strings is deterministic anyhow, also presumably of the underlying bit patterns, anybody who compares a string against a bit pattern has rocks in their head anyhow. - Tim
Received on Wednesday, 21 May 1997 07:01:17 UTC