- From: Eve L. Maler <elm@arbortext.com>
- Date: Tue, 03 Jun 1997 19:42:57 -0400
- To: w3c-sgml-wg@w3.org
- Cc: elm@arbortext.com
Here's the writeup I mentioned earlier. I tried to be thorough, but I may
have missed some things. Comments are welcome, of course.
Eve
* * *
I've already been asked many times by clients whether their existing DTDs
should "conform to XML." In most of the cases, I believe the answer is no,
simply because the focus is on delivery of SGML over the Web (the primary
goal of the XML effort in the first place), rather than validation of SGML
over the Web. At the same time, I've already written a couple of new
XML-conforming DTDs because the client felt it was simpler this way!
It seems to me that you could use a two-tiered approach to conformance that
depends on the circumstances of creation and delivery.
(Note that I'm not addressing the scenario of ad hoc tagging, where there's
no DTD in the picture, and the creator is likely using XML-conforming
instance syntax from the beginning. I've also pretty much ignored
declarations involved in the SHORTREF and LINK features.)
---------------------------------------------------------------------------
Instance and Internal Subset Conform to XML Constraints
Why: Web delivery of instances, where any characteristics of the DTD worth
transmitting (such as architectural forms-type attributes and entity
declarations) are put into the internal subset as part of XML delivery.
The following list assumes that for any one instance, a portion of the DTD
might need to be sent in the internal subset. Below, "transformation"
refers to automatable preparation of such portions before they are
extracted and placed in the internal subset (precise details on which
declarations must be extracted aren't given here; maybe I'll get around to
that later).
What:
1. The instance has to be well-formed: special empty-element and PI
syntax, normalization, etc.
2. Either element type declarations can't use CDATA or RCDATA declared
content, or the elements' content in the instance must be transformed
to escape the appropriate characters that look like markup
3. The DTD should avoid attribute value defaulting if you want to
minimize the need to put attribute list declarations in the internal
subset (use #IMPLIED plus a style sheet instead); if default values
are supplied, they must be quoted
4. Attribute declared values can't be NAME[S], NUMBER[S], or NUTOKEN[S]
(probably use NMTOKEN[S] instead, but also possibly CDATA)
5. Attribute default values can't use #CURRENT (no good substitute)
6. Attribute default values can't use #CONREF (use #IMPLIED plus a style
sheet instead)
7. Either SDATA entities can't be referenced, or SDATA entity references
must be replaced with decimal or hexadecimal character references (or
whatever substitute is appropriate) in the instance
8. Either CDATA entities can't be referenced, or the entity type must be
changed and the contents transformed to escape characters that look
like markup
9. Bracketed entities can't be referenced (in general, these make
ill-formed entities because they contain only half of a markup
construct)
10. SUBDOC entities can't be referenced (it might take quite a bit of work
to extricate and transform any uses of SUBDOC entities)
11. Entity declarations must not have data attributes specified
12. External entity declarations must conform to PUBLIC/SYSTEM syntax
requirements
13. DTD marked sections must be either transformed to remove any spaces
around status keywords, or resolved; the TEMP keyword can't be used
14. Parameter entities either conform to whatever ends up being allowed,
or are transformed or resolved
15. DTD comments within markup declarations are either removed or are
transformed to be moved outside and turned into full comment
declarations
---------------------------------------------------------------------------
Instance, Internal Subset, and External Subset Conform to XML Constraints
Why: Validation of a document using XML tools that are not also validating
SGML parsers. I consider this an unlikely scenario, given the clamor for
many kinds of validation that SGML can't do today and given the desire to
do ad hoc tagging even when there's a DTD present.
The following list assumes that it's desirable to use the same DTD for SGML
and XML applications, without transformation.
What:
1. As in the above scenario, the instance has to be well-formed: special
empty-element and PI syntax, normalization, etc.
2. Either element type declarations must contain no omitted-tag
minimization specifications, or the specifications must be
parameterized (according to the current XML-Lang spec) and resolve to
null strings in the XML version
3. Element type declarations can't use content model exceptions
4. Element type declarations can't use AND (&) content models
5. Element type declarations can't use CDATA or RCDATA declared content
(use CDATA sections in the instance instead)
6. Unlike the above scenario, the DTD can freely use attribute value
defaulting; the default values must be quoted
7. As in the above scenario, attribute declared values can't be NAME[S],
NUMBER[S], or NUTOKEN[S] (probably use NMTOKEN[S] instead, but also
possibly CDATA)
8. Attribute default values can't use #CURRENT (no good substitute)
9. As in the above scenario, attribute default values can't use #CONREF
(use #IMPLIED plus a style sheet instead)
10. SDATA entities can't be declared or referenced
11. CDATA entities can't be declared or referenced (use CDATA sections
instead)
12. Bracketed entities can't be declared or referenced
13. SUBDOC entities can't be declared or referenced
14. As in the above scenario, entity declarations must not have data
attributes specified
15. Notation declarations must not have data attribute list declarations
16. As in the above scenario, external entity declarations must conform to
PUBLIC/SYSTEM syntax requirements
17. DTD marked sections must be have no spaces around status keywords; the
TEMP keyword can't be used
18. Parameter entities must conform to whatever ends up being allowed
19. DTD comments must be in full comment declarations, outside other
markup declarations
---------------------------------------------------------------------------
Additional XML-Related DTD Design Considerations
Whether your SGML tools have support for the TC version of SGML...
Received on Tuesday, 3 June 1997 19:40:36 UTC