Re: DTDs and XML conformance

I spoke too soon on a couple of things, and thought of some new items in
the shower (great place to think!).

(By the way, Peter, if you're interested in putting this in your FAQ, I'm
willing to collect items for the next week or so and send you the resulting
lists in HTML form.)

New item for list 1:

o Attribute list declarations either cannot supply multiple GIs in a name
group or must be transformed to split out into several declarations, one
per GI.

New items for list 2:

o Element declarations cannot supply multiple GIs in a name group.

o Attribute list declarations cannot supply multiple GIs in a name group.

Corrections and additional comments below:

At 11:40 PM 6/3/97 EDT, lee@sq.com wrote:
>Eve's list is very helpful.
>
>I've made some detailed comments, becuase I think it's useful enough to
>be added to the XML FAQ.
>
>Lee
>
>
>
>
>>   1. The instance has to be well-formed: special empty-element and PI
>>      syntax, normalization, etc.
>Yes.  Perhaps this is actully where OMITTAG & SHORTTAG fit.
>
>>   2. Either element type declarations can't use CDATA or RCDATA declared
>>      content, or the elements' content in the instance must be transformed
>>      to escape the appropriate characters that look like markup
>Yes.
>
>>   3. The DTD should avoid attribute value defaulting if you want to
>>      minimize the need to put attribute list declarations in the internal
>>      subset (use #IMPLIED plus a style sheet instead); if default values
>>      are supplied, they must be quoted
>Yes and yes, although the quoting is already covered by [1].

The quoting isn't covered by [1], except insofar as all XML constraints on
DTDs are covered by well-formedness requirements.  What well-formedness
emphasizes is attribute value quoting in start-tags.

>>   4. Attribute declared values can't be NAME[S], NUMBER[S], or NUTOKEN[S]
>>      (probably use NMTOKEN[S] instead, but also possibly CDATA)
>Yes.
>
>>   5. Attribute default values can't use #CURRENT (no good substitute)
>Well, arguable there are few good reasons to use #CURRENT in the first
>place :-)  This would be a good place to mention RANK, I think, if anyone
>actually uses it.

I think I was wrong about RANK.  You can declare ranked elements, and the
instance will still be well-formed, and the element declarations themselves
aren't internal-subset-worthy.

>>   6. Attribute default values can't use #CONREF (use #IMPLIED plus a style
>>      sheet instead)
>No.  Interestingly, it would be very cheap to implement CONREF in XML,
>because the tag would look different:
>	<gi x="y"/>
>vs.
>	<gi>stuff</gi>
>but you would have to have a way of knowing which was the conref attribute.
>I don't think it's worth adding the feature.
>
>>   7. Either SDATA entities can't be referenced, or SDATA entity references
>>      must be replaced with decimal or hexadecimal character references (or
>>      whatever substitute is appropriate) in the instance
>No.  This is the first place where we disagree.  You can retain the
>entity references in the document.  Only the entitiy definitions need to
>be changed.  For example, change
>    <!Entity eacute SDATA "[eacute]">
>to
>    <!Entity eacute "&#225;">
>in the DTD.
>If you need font changes, you can use
>    <!Entity BoldRedRegisteredSymbol "<B><RED>&#174;</RED></B>">
>instead.  You'd need to define B and RED to pass validation...
>
>>   8. Either CDATA entities can't be referenced, or the entity type must be
>>      changed and the contents transformed to escape characters that look
>>      like markup
>Yes.
>
>>   9. Bracketed entities can't be referenced (in general, these make
>>      ill-formed entities because they contain only half of a markup
>>      construct)
>I am not sure what a bracketed entity is.  It's not in the glossary to
>ISO 8879, and I don't have the handbook here at home.
>
>>  10. SUBDOC entities can't be referenced (it might take quite a bit of work
>>      to extricate and transform any uses of SUBDOC entities)
>This is not a problem since there is no way in XML to _declare_ a
>SUBDOC entity in the first place :-)
>
>>  11. Entity declarations must not have data attributes specified
>Yes.  Actually, SoftQuad Explorer -- and probably Panorama -- uses
>optional entity data attributes for the height and width of images,
>which is a more SGML-like way of doing <IMG.... height=... width=....>.
>But it's not worth a language feature in XML.
>
>>  12. External entity declarations must conform to PUBLIC/SYSTEM syntax
>>      requirements
>Yes.  Is it necessary to state this??

Now I see why you were asking: It is necessary because XML requires that if
PUBLIC is used, a system ID is also present.  I was being sloppy; really,
the items should look like this:

List 1:

o External entity declarations must either supply a system ID along with
any PUBLIC specifications, or be transformed to add one.

o External entity declarations that supply non-null system IDs must use
valid URLs for them.

List 2:

o External entity declarations must supply a system ID along with any
PUBLIC specifications.

o As in the above scenario, external entity declarations that supply system
IDs must use valid URLs for them.

>>  13. DTD marked sections must be either transformed to remove any spaces
>>      around status keywords, or resolved; the TEMP keyword can't be used
>I think not allowing spaces there is a bug in the spec, right?
>
>>  14. Parameter entities either conform to whatever ends up being allowed,
>>      or are transformed or resolved
>Yes.
>
>>  15. DTD comments within markup declarations are either removed or are
>>      transformed to be moved outside and turned into full comment
>>      declarations
>Yes.  The comment syntax seems to have reverted to <!--......-->, although
>I'd still prefer to use <!--*....*--> and have the extra robustness later
>this year.

I need to check; can't remember if we ended up excluding multiple comments
in a single declaration, but I think so.  So this would be another item in
both lists.

>> ---------------------------------------------------------------------------
>[...]
>> The following list assumes that it's desirable to use the same DTD for SGML
>> and XML applications, without transformation.
>
>I'll stop here.  I think that this list could, with some minor edits,
>usefully be added to the XML FAQ.
>
>Lee

Whew.  I'm sure there are more...

	Eve

Received on Wednesday, 4 June 1997 09:58:20 UTC