Blended Proposal from Paul Prescod on 1996-12-17 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Tue, 17 Dec 1996 16:45:28 -0500
To: w3c-sgml-wg@w3.org
Message-Id: <1.5.4.32.19961217214528.00990874@csclub.uwaterloo.ca>
I think that the following proposal is a compromise, but a reasonable one.
It is a blend of existing proposals. It puts the ultimate whitespace
responsibility in the author's court without forcing them to use huge,
verbose attributes, or extra markup in the "common cases" of mixed content
and element-content with-no-whitespace.

---

I propose that we make significant whitespace the default, as per the
Grosso, Delenda Est, etc. proposals, and provide a *simple*, *concise*
mechanism for indicating that insignificant whitespace is being inserted (a
variation on Paoli). This is different from my previous proposal where I
required the curly braces to follow the content model of the DTD. Example:

<ULIST>{
    <UITEM><P>Foo</P></UITEM>
    <UITEM><P>Bar</P></UITEM>
    <UITEM><P>Baz</P></UITEM>
}</ULIST>


A non-validating XML parser would have to strip out the whitespace before
handing the data to an application. Note that the UITEMs can have either
mixed or element content, so the tag delimiters are not tied to the content
models of the DTD. If you try to introduce whitespace in an element-content
node, a validator should tell you that you must change the instance to
either remove the whitespace or hide it with curly braces.

Tools to validate adherence to my new proposal could not be written on top
of SGML parsers unless the SGML parser returns "SSEP" nodes (most current
ones probably do not). But they could be written. Parsers also have to be
changed to check the stupid-net-trick for empty tags.

---

The following special case can be removed if people do not think that
changing from element-content to mixed-content should require special support.

<SPECIAL-CASE>
As David (I think) pointed out earlier, sometimes you want to change from
element content to mixed content. I rebutted that this would change the
meaning of your document and should not be done lightly (in either existing
SGML, or XML as I proposed it). In my new proposal, both concerns are addressed:

Although authors would never do this (nor need to know about it), it would
be legal to put the curly braces around mixed content elements *as long as
that content had no non-whitespace data characters*. Both kinds of XML
parsers would remove these whitespace nodes. This is extra work for
SGML-based parsers but not for new XML-based parsers. This is a concession
to make it easier for people who change DTDs, not for authors.

The point of this is that shifting from element content to mixed content
isn't a problem until you want to *modify* the element's content (to add
non-whitespace data). If you forget to change the "whitespace mode
indicator" in the instance, you will get a well-formedness error: "Element
using curly braces may not have data content." This will happen *very
rarely* and is certainly no harder to figure out than the average SGML error
message.

It is obviously legal to use the traditional angle brackets around
element-content elements if they contain no whitespace, and illegal if they
do not. (as described above)
</SPECIAL-CASE>

 Paul Prescod
Received on Tuesday, 17 December 1996 16:42:44 UTC