- From: Charles F. Goldfarb <Charles@SGMLsource.com>
- Date: Sun, 10 Nov 1996 19:47:35 GMT
- To: Tim Bray <tbray@textuality.com>
- Cc: W3C-SGML-WG@w3.org
On Fri, 08 Nov 1996 12:17:15 -0800, Tim Bray <tbray@textuality.com> wrote: >1. What is XML For? > >Jon has pointed out that by W3C statute, the primary goal of the ERB >and WG work is to enable the delivery of SGML over the Web. Which is >true. However, I and several others on both the ERB and the WG >(including I think Charles Goldfarb) have an entirely un-hidden agenda, >namely to construct a lightweight, lean, mean, easy-to-learn on-ramp >for SGML. Not just an on-ramp, in my case, but a genuine subset. That means that a valid XML 1.0 document must be a conforming ISO 8879:86 document. >3. XML and existing SGML tools > >Existing SGML tools can, if they're compliant, read XML today modulo >*only* overlapping enumerated attribute values and perhaps some >mild inconsistencies as to which RE's are where. The former makes XML *not* conforming SGML. The latter could avoid being nonconforming if the spec were written carefully (see below). > XML is a subset of SGML in spirit and in fact, and >XML is designed so that the adjustments required in existing SGML tools >are trivial. The above statement is inconsistent on its face. If XML is a subset of SGML "in fact", why should existing "compliant" SGML tools require any adjustments? One can argue (but not here, please) whether a DTD-less markup language is a subset of SGML in spirit, but to be one in fact means that conforming XML documents must conform to 8879. At present, they don't. However, XML can be made conforming (without changing its present RE handling) as follows: 1. The spec must carefully distinguish the "conforming subset" SGML syntax from the other things that XML defines, but which have nothing to do with generic SGML parsing. So far, those other things are: a. Character encoding rules for storage objects. b. A mandatory unvarying SGML declaration. It declares the XML concrete syntax, feature set, etc. c. A mandatory unvarying "XML component" of every DTD. It consists of some entity declarations, a short reference mapping declaration, some element type declarations for empty element types, and a data content notation for white space handling (discussed below). It is considered to occur at the start of the internal subset, meaning that nothing in it can be preempted. d. Some attribute definitions for use in DTDs (notably those having to do with white space handling). e. Prescribed semantic (i.e., not parsing) behavior of XML processors, such as browsers and editors. This would include rules for processor behavior when an XML document is non-conforming (not valid), but possibly well-formed, such as in the absence of a DTD. 2. XML's white space handling should be defined as a data content notation (XMLWS), applicable to the document element. (Suitable shortref mappings can prevent normal SGML RE handling from taking place.) The behavior of the notation can be dependent on attribute values and/or element types, so existing XML white space behavior and/or other useful behaviors could be supported. (It is possible, incidentally, that full SGML applications might want to use this notation.) 3. XML 1.0 would have to drop enumerated attribute values. As has been pointed out, the uniqueness constraint is unjustifiable in the absence of shorttag minimization. Declare them as CDATA and include a comment constraining the values (perhaps expressed as a regexp). All that is lost is parser-level validation. We've exhibited great restraint in more important areas to maintain SGML conformance; it would be unfortunate to abandon conformance just for this one.
Received on Sunday, 10 November 1996 14:48:09 UTC