Summary of crypto-DTD discussion

At the risk of missing something, (I have read each message on this
list at least once, but there's an AWFUL lot of them :-), against the
background of our experience with using an SGML-subset for
inter-program communication (see NSGML in Michael's system feature
summary), I'm going to attempt to summarise the emerging consensus in
the general area of what kinds of declaration information is required
for what tasks.  In terms of the preceding discussion, this is a
modest elaboration on the 'Cooked' approach.

My basic take is that we are committed to saying that "[the full
version] of a conforming XML document is a conforming SGML document".
Furthermore, for some tasks such as validation or editing/authoring,
only the full version is appropriate.  HOWEVER, for another common
class of tasks, roughly speaking those we might call read-only tasks,
an abbreviated version is adequate.  There has been considerable
convergence between postings on this list, and our own implementation
experience in Edinburgh, regarding what aspects of the DTD are needed
in the abbreviated DTD, which together with the XML document instance
would constitute the abbreviated version:

1) For each element:
 1a) Whether its content model is EMPTY or not;
 1b) For each of its attributes:
     Their value type and #FIXED or defaulted value;
2) For each SDATA and external NDATA entity, their
    definitions/notations respectively [following Martin's suggestion
    that most entity references in RCDATA are expanded already].

Our experience is that this information suffices for a simple
non-validating parser to parse very fast.  HOW this information is
encoded for transmission to/exploitation by read-only applications
such as viewers is a separate question -- we use a highly optimised
relocatable hashtable which is clearly NOT what is wanted for XML, I
personally like the idea of using e.g. 
<!ELEMENT cryptoDTD - - (elementDecl+,entityDecl*).
etc. as has been suggested.

ht

[Note:  I hope this can be taken as my vote[s] on a number of issues,
I'm actually cut off from net browsing until after the deadline passes
(local holiday, low bandwidth, etc.)]

Received on Sunday, 15 September 1996 22:49:36 UTC