- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: Mon, 16 Sep 96 03:49:18 BST
- To: w3c-sgml-wg@w3.org
At the risk of missing something, (I have read each message on this list at least once, but there's an AWFUL lot of them :-), against the background of our experience with using an SGML-subset for inter-program communication (see NSGML in Michael's system feature summary), I'm going to attempt to summarise the emerging consensus in the general area of what kinds of declaration information is required for what tasks. In terms of the preceding discussion, this is a modest elaboration on the 'Cooked' approach. My basic take is that we are committed to saying that "[the full version] of a conforming XML document is a conforming SGML document". Furthermore, for some tasks such as validation or editing/authoring, only the full version is appropriate. HOWEVER, for another common class of tasks, roughly speaking those we might call read-only tasks, an abbreviated version is adequate. There has been considerable convergence between postings on this list, and our own implementation experience in Edinburgh, regarding what aspects of the DTD are needed in the abbreviated DTD, which together with the XML document instance would constitute the abbreviated version: 1) For each element: 1a) Whether its content model is EMPTY or not; 1b) For each of its attributes: Their value type and #FIXED or defaulted value; 2) For each SDATA and external NDATA entity, their definitions/notations respectively [following Martin's suggestion that most entity references in RCDATA are expanded already]. Our experience is that this information suffices for a simple non-validating parser to parse very fast. HOW this information is encoded for transmission to/exploitation by read-only applications such as viewers is a separate question -- we use a highly optimised relocatable hashtable which is clearly NOT what is wanted for XML, I personally like the idea of using e.g. <!ELEMENT cryptoDTD - - (elementDecl+,entityDecl*). etc. as has been suggested. ht [Note: I hope this can be taken as my vote[s] on a number of issues, I'm actually cut off from net browsing until after the deadline passes (local holiday, low bandwidth, etc.)]
Received on Sunday, 15 September 1996 22:49:36 UTC