Re: XML and required DTDs

I'll admit to not caring very much for the tagged pseudo elements
proposal for dealing with mixed content when I saw the first
un-minimized examples.  This changed somewhat when I saw the shortref
examples.  I'm still not particularly happy with the added markup
burden, but now think it may be worth the trade-off of remaining ESIS
compatible and SGML compatible.


If I understand Stephen DeRose's objections, they are:

1) Causes the the typist to add too many keystrokes.

2) Forces the typist to learn/remember rules about when to mark
   pseudo elements.

3) Adds complexity to the full SGML -> corresponding XML 
   instance -> full SGML round trip processing, since special 
   case handling will be needed for mixed content.

I have a suggested clarification of Charles' proposal which, while
giving Stephen cause to strengthen his first objection, ought to
eliminate the other two. -- Require the markup for pseudo elements
*everywhere* PCDATA would have been called for.

An example:

<p>Here is some mixed content with an <em>emboldened sub-element </em>
in it</p>

My understanding of the proposal with fixed shortrefs:

<p> "Here is some mixed content with an " 
     <em>emboldened sub-element </em>
     "in it."
  </p>

The suggested clarification also quotes the non-mixed PCDATA:

<p> "Here is some mixed content with an " 
     <em> "emboldened sub-element " </em>
     "in it."
  </p>

Adding stupid NET tricks with angle-brackets:

<<p> "Here is some mixed content with an " 
     <<em> "emboldened sub-element " </em>/>
     "in it."
  </p>/>

Adding minimization:
<<p> "Here is some mixed content with an " 
     <<em> "emboldened sub-element " >
     "in it." >


I see several advantages to this approach:

1) Simplifies the round trip process:  
   full SGML -> XML -- all PCDATA is pushed into the "Pseudo" element.  
   XML -> full SGML -- all "Pseudo" element content is always popped up
                       to the containing element.

2) Simplifies RE/RS handling. 

3) Easy to explain -- "Text is *always* quoted. REs are only significant
   when explitly quoted".

4) Fully compatible with today's ISO-8879. 


I think these advantages and tradeoffs are consistent with the
"Principles of Design".  When used with fixed shortrefs and stupid NET
tricks, this proposal also meets a couple of my rule-of-thumb metrics
for hand-editing friendliness:

1) Trivial to write an emacs helper mode.

2) Easy to write a "pretty printer" for the language.


-Bill

-- 
William D. Lindsey
blindsey@bdmtech.com
+1 (303) 672-8954

Received on Thursday, 19 September 1996 12:47:26 UTC