- From: Joe English <jenglish@crl.com>
- Date: Mon, 16 Dec 1996 15:52:21 -0800
- To: w3c-sgml-wg@w3.org
Christopher R. Maden <crm@ebt.com> wrote: > 1) In the absence of a DTD, we must assume mixed content for > everything.[*] With you so far. > 2) This could create whitespace nodes in element content. I don't think so: if all element types have mixed content, then there is no element content in which whitespace nodes could appear! Actually there are two cases to consider: (A) The document creator does not provide a formal document type definition. In this case, the semantics stated in point (1) should apply, and by my reasoning there can be no whitespace in element content to worry about. (B) The document owner provides a formal document type definition, but the document consumer chooses to ignore it. This is the only case where I see a problem, but I don't think it's a very big one. I'll get to that in a moment... > 3) A dichotomy between "DTD-ful" and DTD-less parsing will make any > sibling-based relationship difficult at best; this will affect some > TEI or HyQ based hyperlinks, as well as sibling-based stylistic > decisions. However, it will _not_ cause a problem for a large class of applications; namely those in which extraneous whitespace is irrelevant (full-text indexers, web-walkers, etc.) or is dealt with separately by the application (CSS-based formatters, any formatter based on TeX or TeX-like algorithms, etc.) Extraneous whitespace _can_ cause a problem for DSSSL-based formatters, but I don't think that's a big problem either (please see below). > 4) The only way to avoid the dichotomy is to preserve these whitespace > nodes even when a DTD is present. That's one way to avoid it, but it's not the _only_ way. I propose the following: * If the author fails to provide a formal document type definition, then all whitespace is assumed to be significant (as per (1), above). * If the author does provide a formal document type definition, then applications must examine the DTD to determine where whitespace is significant. I think this will work. A large class of applications simply _do not care_ about the distinction between element and mixed content: they will produce the same results in either case. Applications in that class do not need to examine the DTD. Applications that do care about the distinction will need to examine the DTD. I do not see this as an onerous burden for any of the applications in this class that have been mentioned so far: all the ones listed so far will need to examine the DTD anyway (in order to process general entity declarations), and compared to the effort involved in (say) writing a DSSSL-O engine, parsing content models to see if they contain a #PCDATA token is just not that difficult. Back to case (A): if authors choose not to include a formal document type definition, then it will be up to them to take care of extraneous whitespace. There are several options; they could include whitespace-discarding rules in all relevant stylesheets, format their documents so that no extraneous whitespace appears in the first place, or provide a DTD, to name three. > 5) Since SEPCHAR is thrown away in element content, every element must > be made mixed content, and any element declaration without #PCDATA > is illegal. > > This is clearly unacceptable. Agreed, especially considering the production for "Mixed-content declaration" in section 3.2.1 of the XML spec, which limits content models containing #PCDATA to repeatable OR groups. --Joe English jenglish@crl.com
Received on Monday, 16 December 1996 18:52:10 UTC