Mixed content

Instead of disallowing mixed content in DTDs, XML could
(by application convention) disallow whitespace in element
content.  That way DTD-less parsers wouldn't have to worry
about whether a separator character was significant or
not; they would only be allowed in significant contexts to
begin with.

It may also be possible to finesse the problem with a
suitable grove plan.  In the New Paradigm, separator
characters in element content are not _ignored_ per se;
rather, they turn into 'ssep' nodes which may later be
_filtered out_ by the application's grove plan.  It may be
the case that the distinction between a separator
character that is interpreted as the 'char' property of an
'ssep' node and one that is interpreted as the 'char'
property of a 'ch' node is subtle enough to be ignored in
most cases.  If 'ssep' is included in XML's grove plan,
then it would be possible for a DTD-less parser to create
a grove isomorphic to one created by a "real SGML"
parser.  (The main problem with this approach is that
'ssep' nodes are not part of the (pre-corrigendum) ESIS,
so structure-controlled applications would not be able to
process XML.)

As I understand the current consensus, there will be some
XML applications that need to interpret the entire DTD
(editors, validators, 5-line Omnimark hacks), others that
only need part of the DTD (e.g., to extract <!ATTLIST ...>
declarations for architectural processing), and others
that don't need any information from the DTD at all
(indexers, 5-line Perl hacks); and, as I understand it,
the requirement that XML be parseable without reference to
the DTD is solely for the benefit of applications in the
last class.

My question is: do we envision any applications for which
no information derivable solely from an SGML DTD is
significant *except* for the distinction between element
content and mixed content?  If not, I would hate to give
up mixed content for the sake of applications that don't
care about it to begin with.  (A similar question also
holds for EMPTY declared content).



--Joe English

  jenglish@crl.com

Received on Thursday, 19 September 1996 17:28:47 UTC