Re: SGML and XML from Michael Sperberg-McQueen on 1996-12-01 (w3c-sgml-wg@w3.org from December 1996)

From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Date: Sun, 01 Dec 96 11:04:32 CST
To: Liam Quin <lee@sq.com>, W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <199612011726.MAA29249@www10.w3.org>

On Thu, 28 Nov 1996 16:17:22 -0500 Liam Quin said:
>If SGML can be changed to disable RS/RE processing, or if Charles'
>kludge with SHORTREF works, perhaps we're OK.  We do have to get the
>same whitespace behaviour with and without a DTD, so element content
>and data content is not a distinction XML can make, I think.

I think I agree that simplifying RE/RS handling would be good, but
in the interests of clarity let me ask:  Do you mean we have to make
the DTD irrelevant to white-space handling

    * because that was decided at some point, or
    * as a direct consequence of some other decision, or
    * because that would be a cleaner design?

I agree that it would be desirable to make rules which allow
processors to skip the DTD and still get a wholly correct parse
tree, er, grove, but I don't think that's possible, since a
processor cannot resolve entity references or provide correct
default attribute values without reading the DTD.

I also believe that if XML makes no distinction between element
content and mixed content, it will be impossible to build correct
XML processors on top of SGML parsers.

It might be worthwhile to distinguish clearly among
  (a) full validation, which checks the instance against the DTD and
provides the correct grove,
  (b) full parsing, which doesn't validate but does provide
character-for-character the correct grove, and which requires at
least some attention to the DTD, and
  (c) minimal or partial parsing, which doesn't require a DTD and
which is guaranteed (in XML) to vary from the correct grove only in
specific ways:
    (i) it will have extra white space in element content,
    (ii) it won't provide correct values for defaulted attributes,
    (iii) it won't have information about attribute types like
ID/IDREF.
    (iv) it won't resolve references to entities not hardcoded into
the processor.

I'm not sure (iii) is part of the standard grove output on which we
want to model the XML processor's output; if it isn't, then that's
one difference fewer.

-C. M. Sperberg-McQueen

Received on Sunday, 1 December 1996 12:26:23 UTC