- From: Michael Sperberg-McQueen <U35395@UICVM.CC.UIC.EDU>
- Date: Thu, 03 Oct 96 13:02:00 CDT
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
In its meeting on 2 October, the ERB reached consensus on the following issues relating to the equivalence of document instances and of DTDs in XML and SGML. The brief statement of the points of consensus is followed by some discussion and examples. 1 For any XML DTD XD, it will be possible to generate, without human intervention, an SGML DTD SD, such that (a) SD will accept all the document instances accepted by XD, and (b) SD will produce the same ESIS for them (modulo any exceptions required by the XML handling of white space and record boundaries) 2 If possible, XML will be defined in such a way that for any XML DTD XD, a corresponding SGML DTD SD can be generated, without human intervention, such that in addition to 1(a) and 1(b), (a) SD will accept *only* documents which are ESIS-equivalent to some document instance accepted by XD, and (b) if SD is translated back into XML, producing a third DTD XD', then XD and XD' will accept an ESIS-equivalent set of documents (i.e. for each document accepted by XD, there is a document accepted by XD' which has the same ESIS, and for each document accepted by XD', there is a document accepted by XD which has the same ESIS). ------- Discussion As may be seen, point 2 puts a slightly heavier burden on XML than point 1, requiring in item (a) that if XML DTDs are translated into SGML, the resulting DTD enforces all the constraints of the original XML DTD, and in item (b) that XML DTDs preserve their expressive power and accept equivalent languages even after round-trip conversion into and from full SGML. It's not clear to everyone that this heavier burden can always be met, so point 2 is expressed as a goal, not a hard requirement. Another way of expressing point 2(a) is that XML will not have greater expressive power than Full SGML. This means, for example, that point 2(a) forbids XML to accept arbitrary regular expressions as content models, since some regular expressions cannot be translated into SGML content models. A hypothetical XML DTD with <!ELEMENT x - - ((a,b)*, a?) > <!ELEMENT (a,b) - O EMPTY > could in theory be translated into SGML as <!ELEMENT x - - (a,b?)* > <!ELEMENT (a,b) - O EMPTY > which would fulfil the requirements of point 1, since any document satisfying the first declaration also satisfies the second. A document containing <x><a><a></x>, however, would satisfy the SGML DTD without satisfying the XML DTD. Rule 2(a) says XML can't allow that to happen. This has the effect that XML and SGML tools can both preserve the validity of XML documents, assuming that they validate the documents at all. The view of the ERB is, in short, that suggestions for increasing the expressive power of SGML DTDs -- of which we have several -- will need to go into the WG8 revision work, not into XML. It should probably be noted that the ERB did not discuss, and so has not achieved consensus, on whether XML DTDs may be *less* expressive than SGML DTD's, i.e. whether rule 2 should also work if one switched the names SGML and XML around in it. That is, the jury is still out on whether eliminating constructs like EMPTY, inclusion and exclusion exceptions, etc., is ruled out in principle or not. ------- Further discussion for the hard-core set theorists ... If we consider that each DTD defines (or 'generates') a language, then the set of DTDs possible using some notation generates a set of languages. Call it the LG-set (for 'languages generated set') of the notation. Formally, rule 2(a) requires that LG-set of XML be a subset of SGML's LG-set. Adding a rule 3 which replaces 'SGML' with 'XML' and vice versa would require that SGML's LG-set be a subset of XML's, i.e. that the two LG-sets be identical (using, as always, ESIS-equivalence module RE/RS differences as a test of identity of instances). -CMSMcQ
Received on Thursday, 3 October 1996 14:03:35 UTC