Re: XML and required DTDs from Steven J. DeRose on 1996-09-17 (w3c-sgml-wg@w3.org from September 1996)

From: Steven J. DeRose <sjd@ebt.com>
Date: Tue, 17 Sep 1996 10:11:54 -0400
To: Charles@SGMLsource.com
Cc: Martin Bryan <mtbryan@sgml.u-net.com>, w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19960917141154.006f5d34@kirk.ebt.com>

At 01:02 AM 09/17/96 GMT, Charles F. Goldfarb wrote:
>On Mon, 16 Sep 1996 16:52:44 -0400, "Steven J. DeRose" <sjd@ebt.com> wrote:
>
>>The only things blocking you from parsing one-entity minimal SGML document
>>without a DTD are:
>>
>>   a) Declared content (CDATA/RCDATA/EMPTY elements)
>>   b) The RE-ignoring rules.
>>
>Exactly! Just eliminate declared content and mixed content and you've
solved the
>problem. We don't need those constructs for XML, they are just forms of markup
>minimization parading under other names.

I don't see yet what mixed content has to do with it. I do see need for
keeping mixed content; it's just so messy to have to tag every
pseudo-element as a real element. The only difficulties with mixed content
are the whitespace treatment we've already decided has problems, and a few
cases of OMITTAG that will likely also go away in XML, right? Is there some
problem of mixed content that we aren't going to address another way already?

<digression>I should have addedd attribute normalization to the list of what
you may need a DTD for (without declarations you can't tell CDATA from
IDREFS). But that doesn't strike me as a problem: When in doubt a parser
keeps all the whitespace; an application on top typically knows what it
cares about: for example, to strip whitespace and case-fold when looking up
any identifier. This is really application semantics, and many users dislike
equating the *syntax* of being an 8-character [-.a-z0-9] NAME, with the
*semantics* of being an ID or IDREF. Separating them is probably a benefit,
in which case the parser doesn't need to know (just like a parser returns
extra whitespace in PCDATA, and many applications know they can trash it
except in "verbatim" mode). All we need do is state that an application may
not depend on extra whitespace in attributes to determine processing
semantics: it may not set an element in bold only when its FOO attribute had
double spaces between its name tokens. Hardly an odious limitation.</digression>

Received on Tuesday, 17 September 1996 10:14:18 UTC