Re: DTDs and XML conformance

Lee-- Many thanks for the detailed comments!  You've caught several
subtleties I missed.  A couple of comments below.

At 11:40 PM 6/3/97 EDT, lee@sq.com wrote:
>Eve's list is very helpful.
>
>I've made some detailed comments, becuase I think it's useful enough to
>be added to the XML FAQ.
>
>Lee
>
>
>
>
>>   1. The instance has to be well-formed: special empty-element and PI
>>      syntax, normalization, etc.
>Yes.  Perhaps this is actully where OMITTAG & SHORTTAG fit.
>
>>   2. Either element type declarations can't use CDATA or RCDATA declared
>>      content, or the elements' content in the instance must be transformed
>>      to escape the appropriate characters that look like markup
>Yes.
>
>>   3. The DTD should avoid attribute value defaulting if you want to
>>      minimize the need to put attribute list declarations in the internal
>>      subset (use #IMPLIED plus a style sheet instead); if default values
>>      are supplied, they must be quoted
>Yes and yes, although the quoting is already covered by [1].
>
>>   4. Attribute declared values can't be NAME[S], NUMBER[S], or NUTOKEN[S]
>>      (probably use NMTOKEN[S] instead, but also possibly CDATA)
>Yes.
>
>>   5. Attribute default values can't use #CURRENT (no good substitute)
>Well, arguable there are few good reasons to use #CURRENT in the first
>place :-)  This would be a good place to mention RANK, I think, if anyone
>actually uses it.

Omigosh, I totally forgot about RANK.  (What a surprise. :-)  You're right,
this should be given a list item.

>>   6. Attribute default values can't use #CONREF (use #IMPLIED plus a style
>>      sheet instead)
>No.  Interestingly, it would be very cheap to implement CONREF in XML,
>because the tag would look different:
>	<gi x="y"/>
>vs.
>	<gi>stuff</gi>
>but you would have to have a way of knowing which was the conref attribute.
>I don't think it's worth adding the feature.

I see.  What I was shooting for here is that the ATTLIST can't mention
CONREF because this default value isn't valid in XML, and you might need to
extract this att specification for your internal subset, but in fact you
don't need to; the fact that it's CONREF doesn't make it
internal-subset-worthy.

>>   7. Either SDATA entities can't be referenced, or SDATA entity references
>>      must be replaced with decimal or hexadecimal character references (or
>>      whatever substitute is appropriate) in the instance
>No.  This is the first place where we disagree.  You can retain the
>entity references in the document.  Only the entitiy definitions need to
>be changed.  For example, change
>    <!Entity eacute SDATA "[eacute]">
>to
>    <!Entity eacute "&#225;">
>in the DTD.
>If you need font changes, you can use
>    <!Entity BoldRedRegisteredSymbol "<B><RED>&#174;</RED></B>">
>instead.  You'd need to define B and RED to pass validation...

Oops, you're right.

>>   8. Either CDATA entities can't be referenced, or the entity type must be
>>      changed and the contents transformed to escape characters that look
>>      like markup
>Yes.
>
>>   9. Bracketed entities can't be referenced (in general, these make
>>      ill-formed entities because they contain only half of a markup
>>      construct)
>I am not sure what a bracketed entity is.  It's not in the glossary to
>ISO 8879, and I don't have the handbook here at home.

A bracketed internal text entity is where you specify, e.g., "STARTTAG" to
get the replacement value to be "bracketed" with the STAGO and TAGC
delimiters.  I have yet to find a juicy use for these.

>>  10. SUBDOC entities can't be referenced (it might take quite a bit of work
>>      to extricate and transform any uses of SUBDOC entities)
>This is not a problem since there is no way in XML to _declare_ a
>SUBDOC entity in the first place :-)

I was assuming an existing DTD that someone is considering redesigning to
conform to XML.  In this case, there's no harm declaring it, but if you
actually use it, you'll need to include the declaration in the internal
subset, which will break XML.  Am I missing something?

>>  11. Entity declarations must not have data attributes specified
>Yes.  Actually, SoftQuad Explorer -- and probably Panorama -- uses
>optional entity data attributes for the height and width of images,
>which is a more SGML-like way of doing <IMG.... height=... width=....>.
>But it's not worth a language feature in XML.
>
>>  12. External entity declarations must conform to PUBLIC/SYSTEM syntax
>>      requirements
>Yes.  Is it necessary to state this??

Just trying to be exhaustively complete. :-)

>>  13. DTD marked sections must be either transformed to remove any spaces
>>      around status keywords, or resolved; the TEMP keyword can't be used
>I think not allowing spaces there is a bug in the spec, right?

I'm not sure whether we've agreed to take this up, but I'm certainly
willing to.

>>  14. Parameter entities either conform to whatever ends up being allowed,
>>      or are transformed or resolved
>Yes.
>
>>  15. DTD comments within markup declarations are either removed or are
>>      transformed to be moved outside and turned into full comment
>>      declarations
>Yes.  The comment syntax seems to have reverted to <!--......-->, although
>I'd still prefer to use <!--*....*--> and have the extra robustness later
>this year.
>
>> ---------------------------------------------------------------------------
>[...]
>> The following list assumes that it's desirable to use the same DTD for SGML
>> and XML applications, without transformation.
>
>I'll stop here.  I think that this list could, with some minor edits,
>usefully be added to the XML FAQ.
>
>Lee

I never did explicitly say anywhere that the second list equates to
"conforming to XML," but that was the intent.  Note that the first list
does *not* result in an XML-conforming DTD, except as regards the
extracted/transformed DTD fragment that ends up in the internal subset.

	Eve

Received on Wednesday, 4 June 1997 09:12:37 UTC