Compatibility issues and principle #3
Our design principle #3 currently reads, "XML shall be compatible with
SGML." I'm hoping we're ready to get more specific about what this
means. (I apologize for starting a thread and then bolting -- I'll be
largely out of email-touch from tomorrow through Sep 26.)
Here are the questions I think we need to be able to answer:
o Who is the customer/audience for XML -- existing robust-SGML users,
existing Web/HTML users who are not SGML-aware, or both? What
"legacy information" (if any) should work with XML with no
I don't think we should penalize existing users of SGML for using
its awkward features before we came along with our "cleanup effort."
At the same time, many undersupported features of SGML are supported
*somewhere* by *someone* -- how far do we go?
For example: As Paul G. mentioned, even HTML uses EMPTY elements
in a totally natural way. I don't think it's reasonable to make these
documents, and millions of pages of other SGML documents, change over
to an EMPTY-less model. At the same time, CDATA and RCDATA elements are
pretty widely supported, but little used because of the authoring
complications they introduce. Would these be fair to toss?
I'd rather think of XML as an effort to define a cohesive SGML
"application profile" that benefits both tool creators and document
creators, rather than a set of unrelated cool hacks that make it easier
to write parsers. If we're trying to define the intersection of useful,
understandable, and implementable characteristics, I don't want to
pull the drawstring too tight.
Also, I'm wary of playing around with delimiters and shortrefs when
there are already widely used methods available for doing something.
(E.g., what's wrong with using < instead of a backslash to escape
left angle brackets? It's consistent with treatment of other special
characters, and it's one thing that HTML hackers have adapted to quite
o What should happen when existing SGML documents (including valid HTML)
are processed by XML tools? Should a "round trip" between the two
forms be possible, or is only XML->SGML or SGML->XML okay?
If we provide for only XML->SGML, then I think we're setting up a
situation where XML is like "SGML with macros," which gets expanded
as soon as it gets into a "real" system. If we provide only for
SGML->XML, then XML may just fill the role of "HTML Heavy" -- good
only for static delivery, not for serious work. But round trips could
be problematic, if (e.g.) we remove the ability to have EMPTY elements
and users expect the ESIS of each form to be equivalent.
I think a round trip should be weighted as highly desirable, with
transforms relatively undesirable.