Datatypes for DTDs

A recent Note submitted to the W3C has a proposal aimed at legacy
systems:

   http://www.w3.org/TR/dt4dtd

First, a nitpick: the example in the Note is incomplete, in that the
'public' and 'binding' attributes are undeclared - unless it's the
intent of the value of the 'a-dtype' attribute, as a map, to be taken
also as an implicit declaration of these attributes.  However, this
interpretation is belied by the statement:

  The attrNames are the names of attributes declared in the same
  attribute-list declaration for the element type. 

Second, the 'e-dtype' is unnecessary, because it duplicates the
function of a NOTATION declared value, which the legacy systems in
question presumably already support.  If, however, SGML-geekery like
NOTATION declared values are for some reason verboten, the same effect
could be achieved through the 'a-dtype' attribute alone.  Since the
purpose of this attribute is to record a map of declared notations,
it suffices to represent the referential intent of the 'e-dtype'
attribute with a conventional name like '#CONTENT', viz.,

  <!ATTLIST person
     birthdate CDATA   #IMPLIED
     height    CDATA   #IMPLIED
     public    CDATA   #IMPLIED
     binding   CDATA   #IMPLIED
     a-dtype   CDATA   #FIXED
        "#CONTENT social-security-number
         pubdate date
         binding length">
 
in which case the 'a-dtype' attribute were perhaps better named as
'dtype-map'.  (That is, just as a matter of significant names, it is
just as much a burden to recognize 'e-dtype' as '#CONTENT'.)

Third, the 'a-dtype' attribute has the makings of a kludge that won't
go away.  I suppose there is some voodoo with parameter entities that
could achieve the Right Thing, but the basic purpose of the 'a-dtype'
attribute has already been addressed by K.4.4.2 in the WebSGML TC

  http://www.ornl.gov/sgml/wg8/document/1955.htm

which allows direct association of notations with attribute values:

  <!ATTLIST person
     birthdate CDATA        #IMPLIED
     height    CDATA        #IMPLIED
     public    DATA  date   #IMPLIED
     binding   DATA  length #IMPLIED>

The problem, of course, is that legacy systems don't support this new
category of declared value.  (And hiding this via a PE, switchable
between 'CDATA' and 'DATA foo', clearly defeats the purpose of making
the notation name available in the relevant - 'CDATA' - case.)  
Still, there's no reason to make transition strategies to this form
(which would take a simple modification of production [57] in the XML
spec to legitimize) more difficult with cumberously different systax.
Nevertheless, I think the Note is incomplete without a mention of the
facility enabled by K.4.4.2 of the WebSGML TC.

Fourth, it's not clear that the proposal actually solves the problem
for all legacy systems!  There is an assumption that the application,
on parsing the contents of the 'a-dtype' attribute, still has the
ability to query the parser interface for notation information.  The
ESIS output of nsgmls, for instance, does not report notations which
are not referenced, so a consumer of ESIS data would have no way of
knowing to what notations such names as 'date' and 'length' actually
referred.  

This example is only symptomatic of the real problem, which is the
lack of an explicit connection between names like 'date' or 'length'
and notation declarations.  The only workaround seems to be to "pull
in" the notations via suitably declared (albeit bogus) external
entities, which could lead to a sneaky scheme like this:

   <!ENTITY bogus.date   PUBLIC "foo//bar"   NDATA date >
   <!ENTITY bogus.length PUBLIC "baz//blort" NDATA length >
   ...
   <!ATTLIST person
         ...
         notations ENTITIES #FIXED "bogus.date bogus.length"
         dtype-map CDATA    #FIXED "notations public binding" 
         ...>

(where the first name in 'dtype-map' identifies the 'notations'
attribute as the "hook" to the notation declarations.)  Of course, all
this is even further away from the straightforward syntax of K.4.4.2!

Fifth - well, it seems to have been covered already, in a way.  The
Note comes with a link to a Comment from the W3C Staff:

   http://www.w3.org/Submission/2000/01/Comment

While the basic criticism of the Comment - apparent reliance on a
global convention regarding the names 'e-dtype' and 'a-dtype' - is
well taken, it's somewhat surprising that this should serve as yet
another occasion to flog the favorite maguffin of the W3C.  Presuming
that everyone simply has to jump on the Namespace bandwagon has no
relevance to the technical merits of the proposal in the Note.  If
anything, the modified example (using colonized names) is worse than
the original kludge.  And it *is* a kludge, all but admitted, since
the intent is to give legacy systems something to work with.  In fact,
the Note has this:

  NOTE: The "dtype" attributes are based on notations instead of XML
  Namespaces because they are meaningful in the DTD and not in the
  document instance. [...]

That is, introducing namespace prefixes into names that are visible in
a DTD is an utterly bogus approach, because such prefixes are strictly
tactical in any given instance.  The alpha-renaming requirement of the
namespace bogosity is not "solved" by rewriting DTDs all the time, nor
does poisoning DTDs with the same slicing-and-dicing rigamarole with
otherwise atomic names serve any constructive purpose for legacy
systems.

Moreover, rushing the deus ex machina du jour to the rescue actually
obscures the real issue at hand: the global convention.  The important
thing here is not its existence, but its annunciation, i.e *declaring*
the fact that a convention is in force (global or no).  That is, it's
not necessary for a (legacy) application to bind itself to 'a-dtype'
or 'dtype-map' as the *fixed* name of the notation mapping attribute.
A processing instruction (which retains the advantage of also being
passed to ESIS consumers) does the trick:

  <?xml:notation-map map-name="dtype-map" ?>

(or elaborations thereof, to cater to things like different names for
different element types, etc.)

I'm aware that the W3C's horror of things like declaration syntax and
DTDs is matched only by a visceral distaste for PIs, but we shouldn't
lose sight of the fact that the proposal is aimed at legacy systems.


Arjun

Received on Friday, 4 February 2000 10:00:56 UTC