- From: Tim Bray <tbray@textuality.com>
- Date: Wed, 21 May 1997 13:00:46 -0700
- To: w3c-sgml-wg@w3.org
I have revised my paper on data typing at
[1] http://www.textuality.com/xml/typing.html
to take into account some of the issues raised here. The following
changes have been made:
1. scientific notation is supported for FLOAT data types
2. the DATE/TIME/TIMESTAMP types conform to ISO 8601:1988 (thanks to
Peter M-R for digging this up)
3. rather than overload everything on XML-SQLSIZE, there are several
independent data type parameterization attributes
4. used the nice explanatory technique from [2] below of
providing element-like declarations for each type, to illustrate
the use of the various parameterizing attributes
5. more examples
Another note that is relevant is
[2] Jon Bosak's legible version of Jon Paoli's posting, dated 16/05/97,
embodying a Microsoft proposal, entitled "SD3 - Data Types [fmt]"
The following issues need to be decided by the ERB - most of them have
been sufficiently discussed I think, but it can't hurt to lay them out.
DT-1. Should we propose a data typing mechanism as part of the XML work?
Pro: the whole SGML world has been screaming for this for years, and
XML needs it even more.
Con: less is more - the world, despite its screams, has limped along
without it all this time. Also we have other important work to do.
DT-2. Should the data typing mechanism be a separate paper in the WD-xml
series rather than part of XML-lang?
Pro: Keep XML-lang simple. SGML (& maybe HTML) can use it too.
Con: The usefulness of XML-lang may be impaired if it doesn't have
the typing guaranteed to be built-in.
DT-3. Should the data typing be a universal/extensible regexp-based thing,
(as proposed by Gavin Nicol and others) rather than a simple subset of
of the SQL types as proposed in [1]?
Pro: extensibility is good - the usages of SGML and XML are unpredictable;
SQL types were designed for boring COBOL applications.
Con: we already have extensibility with SGML extended facilities lextypes;
the SQL types are proven in commercial practice, and are presented
at the right level for the people who build real applications.
DT-4. Should data typing be provided for attribute values, not just
content as proposed in [1]?
Pro: the minimal typing provided by SGML is for attributes; they are
typically a good place to put atomic values any way.
Con: for element content, you can do it with just one or two typing
attributes - if you want to do attributes, the mapping machinery
gets bigger and more complicated. Once again, less is more - if
we have it for elements, do we really need it for attributes?
DT-5. Should data typing for the content of elements be applicable to
mixed content as proposed in [2], as well as pure character data content
as proposed in [1]?
Pro: There's no real architectural problem with doing mixed content, you
just pretend the child elements aren't there.
Con: With mixed content you can get into whitespace problems; also, it
"feels like" the data typing should apply to atomic items, and
mixed content doesn't "feel" atomic.
DT-6. Should the primary attribute name be XML-TYPE as proposed in [2]
rather than XML-SQLTYPE as proposed in [1]?
Pro: Shorter is better; having all these attributes with SQL in front
of them makes them much less readable.
Con: These are not pulled out of the air, but rely heavily on SQL; it
may be desirable to have other typing mechanisms introduced
later; they would mostly be predeclared in internal subsets;
terseness is not supposed to be a big deal per our design goals.
DT-7. Should the typing proposal include data value ranges as proposed
in [1] (omitted in [2])?
Pro: Databases use them; this will make it pretty easy to construct an
input/authoring system that will produce examples that are loadable
into a database with somewhat higher confidence.
Con: This really amounts to validation rather than typing, which is
another domain; the approach in [1] is violently incompatible with
SQL, which does it with real SQL queries, that can also be
extended to involve the values of other queries - in fact, anything
that an SQL query can do.
DT-8. Should the CHAR datatype use both SIZE and MAXSIZE parameters as
in [2] (in [1], it only had SIZE and was conceived of as a fixed-size
field)?
Pro/Con: I don't understand [2] in this regard - I was following SQL
in assuming that CHAR fields are fixed-size and thus need only
one parm.
DT-9. Should the numeric data types DECIMAL, INTEGER, and FLOAT use
parameters designed to control the maximum datum size as in [2]'s
XML-DIGITS, XML-DIGITS-R (both for DECIMAL and INTEGER) and
XML-BITS (for FLOAT) ([1] provides only SCALE, for DECIMAL, and
nothing else).
Question: what is DIGITS-R, anyhow? It doesn't show up in the
example and is not otherwise explained.
Pro: Better control over the storage requirements, and another
useful validation step.
Con: More machinery - perhaps beyond XML's complexity cost/benefit ratio.
DT-10. For the FLOAT datatype, should we prescribe that these internally
correspond to IEEE floats (as proposed by someone I forget who)?
Pro: This would make comparison and sorting deterministic, and ensure
that the string representation corresponds to a particular bit
pattern.
Con: Overspecification - comparison/sorting of the strings is
deterministic anyhow, also presumably of the underlying bit
patterns, anybody who compares a string against a bit
pattern has rocks in their head anyhow.
- Tim
Received on Wednesday, 21 May 1997 07:01:17 UTC