Re: Data types again from Paul Prescod on 1997-05-21 (w3c-sgml-wg@w3.org from May 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Wed, 21 May 1997 15:38:36 -0400 (EDT)
To: tbray@textuality.com (Tim Bray)
Cc: w3c-sgml-wg@w3.org
Message-Id: <199705211938.PAA07602@calum.csclub.uwaterloo.ca>
> DT-2. Should the data typing mechanism be a separate paper in the WD-xml
>  series rather than part of XML-lang?
>  Pro: Keep XML-lang simple.  SGML (& maybe HTML) can use it too.
>  Con: The usefulness of XML-lang may be impaired if it doesn't have
>       the typing guaranteed to be built-in.

I think we should do data types in a fourth deliverable.  I don't see how
XML-Lang is impaired if it isn't built-in. Applications that need it will
specify that both parts of XML are required. Is XML-Lang impaired because
it doesn't have "built-in" linking or stylesheets? 

> DT-3. Should the data typing be a universal/extensible regexp-based thing,
>  (as proposed by Gavin Nicol and others) rather than a simple subset of
>  of the SQL types as proposed in [1]?
>  Pro: extensibility is good - the usages of SGML and XML are unpredictable;
>       SQL types were designed for boring COBOL applications.
>  Con: we already have extensibility with SGML extended facilities lextypes;
>       the SQL types are proven in commercial practice, and are presented
>       at the right level for the people who build real applications.

Say no to COBOL!!! I want to be able to use XML for complex numbers and
strings in a variety of "special" formats. We should make it extensible from 
the start. This is another reason to make it a fourth deliverable: those
that aren't willing to make the commitment to extensibility should just
treat them all like strings.

> DT-4. Should data typing be provided for attribute values, not just
>  content as proposed in [1]?
>  Pro: the minimal typing provided by SGML is for attributes; they are
>       typically a good place to put atomic values any way.
>  Con: for element content, you can do it with just one or two typing
>       attributes - if you want to do attributes, the mapping machinery
>       gets bigger and more complicated.  Once again, less is more - if
>       we have it for elements, do we really need it for attributes?

I think that we should start to unify and rationalize element content and
attributes. SGML/XML has enough special cases and un-unified concepts as it 
is. 

I personally feel uncomfortable with the notion that attribute are only
in the language because they are "convenient". If that were the case then
why do we treat them so differnt from content in the grove model, in our
stylesheet languages, in our query languages, in our SGML editors, etc. In
my mind, attributes are *named data roles*, like "members" in an OOP
language, or "properties" in the grove model. As in the grove model, 
content is a special attribute, the attribute that describes the spanning
tree that we call the "document hierarchy". Thus I consider moves to make
attributes and elements even more different a step backwards.

> DT-6. Should the primary attribute name be XML-TYPE as proposed in [2]
>  rather than XML-SQLTYPE as proposed in [1]?
>  Pro: Shorter is better; having all these attributes with SQL in front
>       of them makes them much less readable.
>  Con: These are not pulled out of the air, but rely heavily on SQL; it
>       may be desirable to have other typing mechanisms introduced
>       later; they would mostly be predeclared in internal subsets; 
>       terseness is not supposed to be a big deal per our design goals.

If the mechanism is extensible then XML-TYPE is appropriate. If the mechanism
is tied to SQL, then XML-SQLTYPE is appropriate. Maybe types should live 
in namespaces in which case the answer to this will depend on what we do
with namespaces.

 Paul Prescod
Received on Wednesday, 21 May 1997 15:38:45 UTC