- From: Eve L. Maler <elm@arbortext.com>
- Date: Mon, 16 Sep 1996 17:50:11 -0400
- To: w3c-sgml-wg@w3.org
- Cc: elm@arbortext.com
Our design principle #3 currently reads, "XML shall be compatible with SGML." I'm hoping we're ready to get more specific about what this means. (I apologize for starting a thread and then bolting -- I'll be largely out of email-touch from tomorrow through Sep 26.) Here are the questions I think we need to be able to answer: o Who is the customer/audience for XML -- existing robust-SGML users, existing Web/HTML users who are not SGML-aware, or both? What "legacy information" (if any) should work with XML with no transformation? I don't think we should penalize existing users of SGML for using its awkward features before we came along with our "cleanup effort." At the same time, many undersupported features of SGML are supported *somewhere* by *someone* -- how far do we go? For example: As Paul G. mentioned, even HTML uses EMPTY elements in a totally natural way. I don't think it's reasonable to make these documents, and millions of pages of other SGML documents, change over to an EMPTY-less model. At the same time, CDATA and RCDATA elements are pretty widely supported, but little used because of the authoring complications they introduce. Would these be fair to toss? I'd rather think of XML as an effort to define a cohesive SGML "application profile" that benefits both tool creators and document creators, rather than a set of unrelated cool hacks that make it easier to write parsers. If we're trying to define the intersection of useful, understandable, and implementable characteristics, I don't want to pull the drawstring too tight. Also, I'm wary of playing around with delimiters and shortrefs when there are already widely used methods available for doing something. (E.g., what's wrong with using < instead of a backslash to escape left angle brackets? It's consistent with treatment of other special characters, and it's one thing that HTML hackers have adapted to quite easily.) o What should happen when existing SGML documents (including valid HTML) are processed by XML tools? Should a "round trip" between the two forms be possible, or is only XML->SGML or SGML->XML okay? If we provide for only XML->SGML, then I think we're setting up a situation where XML is like "SGML with macros," which gets expanded as soon as it gets into a "real" system. If we provide only for SGML->XML, then XML may just fill the role of "HTML Heavy" -- good only for static delivery, not for serious work. But round trips could be problematic, if (e.g.) we remove the ability to have EMPTY elements and users expect the ESIS of each form to be equivalent. I think a round trip should be weighted as highly desirable, with transforms relatively undesirable. Eve
Received on Monday, 16 September 1996 19:10:14 UTC