- From: Eve L. Maler <elm@arbortext.com>
- Date: Tue, 03 Jun 1997 19:42:57 -0400
- To: w3c-sgml-wg@w3.org
- Cc: elm@arbortext.com
Here's the writeup I mentioned earlier. I tried to be thorough, but I may have missed some things. Comments are welcome, of course. Eve * * * I've already been asked many times by clients whether their existing DTDs should "conform to XML." In most of the cases, I believe the answer is no, simply because the focus is on delivery of SGML over the Web (the primary goal of the XML effort in the first place), rather than validation of SGML over the Web. At the same time, I've already written a couple of new XML-conforming DTDs because the client felt it was simpler this way! It seems to me that you could use a two-tiered approach to conformance that depends on the circumstances of creation and delivery. (Note that I'm not addressing the scenario of ad hoc tagging, where there's no DTD in the picture, and the creator is likely using XML-conforming instance syntax from the beginning. I've also pretty much ignored declarations involved in the SHORTREF and LINK features.) --------------------------------------------------------------------------- Instance and Internal Subset Conform to XML Constraints Why: Web delivery of instances, where any characteristics of the DTD worth transmitting (such as architectural forms-type attributes and entity declarations) are put into the internal subset as part of XML delivery. The following list assumes that for any one instance, a portion of the DTD might need to be sent in the internal subset. Below, "transformation" refers to automatable preparation of such portions before they are extracted and placed in the internal subset (precise details on which declarations must be extracted aren't given here; maybe I'll get around to that later). What: 1. The instance has to be well-formed: special empty-element and PI syntax, normalization, etc. 2. Either element type declarations can't use CDATA or RCDATA declared content, or the elements' content in the instance must be transformed to escape the appropriate characters that look like markup 3. The DTD should avoid attribute value defaulting if you want to minimize the need to put attribute list declarations in the internal subset (use #IMPLIED plus a style sheet instead); if default values are supplied, they must be quoted 4. Attribute declared values can't be NAME[S], NUMBER[S], or NUTOKEN[S] (probably use NMTOKEN[S] instead, but also possibly CDATA) 5. Attribute default values can't use #CURRENT (no good substitute) 6. Attribute default values can't use #CONREF (use #IMPLIED plus a style sheet instead) 7. Either SDATA entities can't be referenced, or SDATA entity references must be replaced with decimal or hexadecimal character references (or whatever substitute is appropriate) in the instance 8. Either CDATA entities can't be referenced, or the entity type must be changed and the contents transformed to escape characters that look like markup 9. Bracketed entities can't be referenced (in general, these make ill-formed entities because they contain only half of a markup construct) 10. SUBDOC entities can't be referenced (it might take quite a bit of work to extricate and transform any uses of SUBDOC entities) 11. Entity declarations must not have data attributes specified 12. External entity declarations must conform to PUBLIC/SYSTEM syntax requirements 13. DTD marked sections must be either transformed to remove any spaces around status keywords, or resolved; the TEMP keyword can't be used 14. Parameter entities either conform to whatever ends up being allowed, or are transformed or resolved 15. DTD comments within markup declarations are either removed or are transformed to be moved outside and turned into full comment declarations --------------------------------------------------------------------------- Instance, Internal Subset, and External Subset Conform to XML Constraints Why: Validation of a document using XML tools that are not also validating SGML parsers. I consider this an unlikely scenario, given the clamor for many kinds of validation that SGML can't do today and given the desire to do ad hoc tagging even when there's a DTD present. The following list assumes that it's desirable to use the same DTD for SGML and XML applications, without transformation. What: 1. As in the above scenario, the instance has to be well-formed: special empty-element and PI syntax, normalization, etc. 2. Either element type declarations must contain no omitted-tag minimization specifications, or the specifications must be parameterized (according to the current XML-Lang spec) and resolve to null strings in the XML version 3. Element type declarations can't use content model exceptions 4. Element type declarations can't use AND (&) content models 5. Element type declarations can't use CDATA or RCDATA declared content (use CDATA sections in the instance instead) 6. Unlike the above scenario, the DTD can freely use attribute value defaulting; the default values must be quoted 7. As in the above scenario, attribute declared values can't be NAME[S], NUMBER[S], or NUTOKEN[S] (probably use NMTOKEN[S] instead, but also possibly CDATA) 8. Attribute default values can't use #CURRENT (no good substitute) 9. As in the above scenario, attribute default values can't use #CONREF (use #IMPLIED plus a style sheet instead) 10. SDATA entities can't be declared or referenced 11. CDATA entities can't be declared or referenced (use CDATA sections instead) 12. Bracketed entities can't be declared or referenced 13. SUBDOC entities can't be declared or referenced 14. As in the above scenario, entity declarations must not have data attributes specified 15. Notation declarations must not have data attribute list declarations 16. As in the above scenario, external entity declarations must conform to PUBLIC/SYSTEM syntax requirements 17. DTD marked sections must be have no spaces around status keywords; the TEMP keyword can't be used 18. Parameter entities must conform to whatever ends up being allowed 19. DTD comments must be in full comment declarations, outside other markup declarations --------------------------------------------------------------------------- Additional XML-Related DTD Design Considerations Whether your SGML tools have support for the TC version of SGML...
Received on Tuesday, 3 June 1997 19:40:36 UTC