- From: Steven J. DeRose <sjd@ebt.com>
- Date: Tue, 17 Sep 1996 10:11:54 -0400
- To: Charles@SGMLsource.com
- Cc: Martin Bryan <mtbryan@sgml.u-net.com>, w3c-sgml-wg@w3.org
At 01:02 AM 09/17/96 GMT, Charles F. Goldfarb wrote: >On Mon, 16 Sep 1996 16:52:44 -0400, "Steven J. DeRose" <sjd@ebt.com> wrote: > >>The only things blocking you from parsing one-entity minimal SGML document >>without a DTD are: >> >> a) Declared content (CDATA/RCDATA/EMPTY elements) >> b) The RE-ignoring rules. >> >Exactly! Just eliminate declared content and mixed content and you've solved the >problem. We don't need those constructs for XML, they are just forms of markup >minimization parading under other names. I don't see yet what mixed content has to do with it. I do see need for keeping mixed content; it's just so messy to have to tag every pseudo-element as a real element. The only difficulties with mixed content are the whitespace treatment we've already decided has problems, and a few cases of OMITTAG that will likely also go away in XML, right? Is there some problem of mixed content that we aren't going to address another way already? <digression>I should have addedd attribute normalization to the list of what you may need a DTD for (without declarations you can't tell CDATA from IDREFS). But that doesn't strike me as a problem: When in doubt a parser keeps all the whitespace; an application on top typically knows what it cares about: for example, to strip whitespace and case-fold when looking up any identifier. This is really application semantics, and many users dislike equating the *syntax* of being an 8-character [-.a-z0-9] NAME, with the *semantics* of being an ID or IDREF. Separating them is probably a benefit, in which case the parser doesn't need to know (just like a parser returns extra whitespace in PCDATA, and many applications know they can trash it except in "verbatim" mode). All we need do is state that an application may not depend on extra whitespace in attributes to determine processing semantics: it may not set an element in bold only when its FOO attribute had double spaces between its name tokens. Hardly an odious limitation.</digression>
Received on Tuesday, 17 September 1996 10:14:18 UTC