Compatibility issues and principle #3

Our design principle #3 currently reads, "XML shall be compatible with
SGML."  I'm hoping we're ready to get more specific about what this
means.  (I apologize for starting a thread and then bolting -- I'll be 
largely out of email-touch from tomorrow through Sep 26.)

Here are the questions I think we need to be able to answer:

o Who is the customer/audience for XML -- existing robust-SGML users, 
  existing Web/HTML users who are not SGML-aware, or both?  What 
  "legacy information" (if any) should work with XML with no 

  I don't think we should penalize existing users of SGML for using
  its awkward features before we came along with our "cleanup effort."
  At the same time, many undersupported features of SGML are supported
  *somewhere* by *someone* -- how far do we go?

  For example: As Paul G. mentioned, even HTML uses EMPTY elements
  in a totally natural way.  I don't think it's reasonable to make these
  documents, and millions of pages of other SGML documents, change over 
  to an EMPTY-less model.  At the same time, CDATA and RCDATA elements are 
  pretty widely supported, but little used because of the authoring
  complications they introduce.  Would these be fair to toss?

  I'd rather think of XML as an effort to define a cohesive SGML 
  "application profile" that benefits both tool creators and document
  creators, rather than a set of unrelated cool hacks that make it easier
  to write parsers.  If we're trying to define the intersection of useful,
  understandable, and implementable characteristics, I don't want to
  pull the drawstring too tight.

  Also, I'm wary of playing around with delimiters and shortrefs when 
  there are already widely used methods available for doing something. 
  (E.g., what's wrong with using < instead of a backslash to escape 
  left angle brackets?  It's consistent with treatment of other special
  characters, and it's one thing that HTML hackers have adapted to quite 

o What should happen when existing SGML documents (including valid HTML) 
  are processed by XML tools?  Should a "round trip" between the two 
  forms be possible, or is only XML->SGML or SGML->XML okay?

  If we provide for only XML->SGML, then I think we're setting up a
  situation where XML is like "SGML with macros," which gets expanded
  as soon as it gets into a "real" system.  If we provide only for 
  SGML->XML, then XML may just fill the role of "HTML Heavy" -- good 
  only for static delivery, not for serious work.  But round trips could 
  be problematic, if (e.g.) we remove the ability to have EMPTY elements
  and users expect the ESIS of each form to be equivalent.

  I think a round trip should be weighted as highly desirable, with
  transforms relatively undesirable.