Re: Newlines in element content (i.e TABLES)

Having finally finished reading all the back mail, I'm ready to try to jump
back into the discussion.
At 8:43 PM 9/25/96, Paul Prescod wrote:
>At 06:14 PM 9/25/96 EDT, lee@sq.com wrote:
>>That's correct.  For what it's worth, SoftQuad Panorama can display SGML
>>tables with newlines between the tags, even if PCDATA is allowed there.
>>It isn't particularly hard to implement, as far as I can see.
>
>But should every SGML application have to implement it over and over again?
>That means that between and within ANY ELEMENT you would have to explicitly
>look out for "meaningless" newlines. Instead of implementing the handling in
>the parser (which code we expect to be used over and over again) you must
>implement it in the application.
>
>Then you have to define in your DTD-documentation that newlines in that
>context are going to be interpreted as "meaningless" which means that we are
>shifting the documentation and education burden to application designers.

The example under consideration was the table:

<TABLE><TR><TD>1</TD><TD>2</TD><TD>3</TD><TD>4</TD></TR><TR><TD>1</TD><TD>2<
/TD><TD>3</TD><TD>4</TD></TR></TABLE>

and the desire to format it, thus:

<TABLE>
<TR><TD>1</TD><TD>2</TD><TD>3</TD><TD>4</TD></TR>
<TR><TD>1</TD><TD>2</TD><TD>3</TD><TD>4</TD></TR>
</TABLE>

Since in the DTD-less case we don't know that the table is element content,
we are unable to remove the whitespace. Now, we could fix this by having
DTD-less processing differ from DTD-ful processing, but I agree with most
of you (I expect) that this is a bad idea.

But, why not format the table like this:

<TABLE><TR
><TD>1</TD><TD>2</TD><TD>3</TD><TD>4</TD></TR
><TR><TD>1</TD><TD>2<
/TD><TD>3</TD><TD>4</TD></TR
></TABLE>

This looks a little weird, but you already have to do this in netscape (at
least for TD elements) because in tables leading whitespace is not ignored.
And if we have a further "application" convention that whitespace is
ignored according to common convention, except when a stylesheet requests
verbatim processing for an element. I'd rather add a "verbatim" declaration
to the DTD, come down to it.

Incompatibility with 8879 is actually not an issue here, anyway, as the
entity manager is _never_ required to report RS/RE to a parser. If current
entity managers insist on recognizing CR and LF as record boundaries, then
we should live with the incompatibility, and encourage the development of
simpler entity managers that are not so "obliging". The RS/RE stuff in SGML
was supposed to make life easier for taggers. Experience has shown that,
arguments about "true content" or not, they do not work, as even SGML
experts can disagree about the meaning of the rules. We are in danger of
making compatibility with SGML's mis-features a millstone around our necks.

>A DTD-less parser doesn't know or care that it is dealing with shortref. It
>would treat '"' as "PCDATA Start" and "PCDATA End".
This is correct.

However, XML would be required to look stupid by quoting things that are
already clearly delimited (by tags), and would be permanently, and
completely incomaptible with all existing SGML and pseudo-sgml (i.e. HTML)
documents. Using quotes to preserve the SGML newlines is like wearing
glasses to fix your outdated contact-lense prescription. Ignoring
whitespace around markup is an idea that has otlived its usefulness. Let's
bury it.

> Paul Prescod

RE delenda est.

   -- David

--------------------------------------------+--------------------------
David Durand                  dgd@cs.bu.edu | david@dynamicDiagrams.com
Boston University Computer Science          | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/    | http://dynamicDiagrams.com/

Received on Thursday, 26 September 1996 14:05:27 UTC