W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > December 1996

Re: RS/RE, again (sorry)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Thu, 12 Dec 1996 16:41:02 -0500
Message-Id: <1.5.4.32.19961212214102.00a81a6c@csclub.uwaterloo.ca>
To: "Christopher R. Maden" <crm@ebt.com>, w3c-sgml-wg@w3.org
At 09:08 PM 12/12/96 GMT, Christopher R. Maden wrote:
>This is already a necessity.  The 8879 rules don't eliminate any
>internal whitespace, and they don't eliminate all leading and trailing
>whitespace.  

They *do* eliminate whitespace in element content. There's a heck of a lot
of that. Most of the newlines in most of my documents are in element content.

>If I have a DSSSL stylesheet on top of an 8879 parser, I
>am going to have routines to strip and compress whitespace.  

Why? If the author put whitespace in mixed content, they probably want it
there. I would just leave it. Anyhow, Jade seems to Do The Right Thing in
the backend. Which is totally different than having to deal with whitespace
"nodes" in the front end.

>(I wouldn't strip whitespace if I expected the element to be
>preformatted, but in that case, I'd just as soon that the parser give
>me *all* of my whitespace, not *most* of it.)

Again, this is all mixed content stuff. I'm talking about element content.

>If every parser passed all the whitespace, then all parsers would give
>the same parse tree.

Sure, as long as you are willing to a) do away with existing SGML parsers
and b) forgo reliable whitespace removal in element content, which will lead
to c) no whitespace in element content, or only whitespace in element
content according to the rules that Microsoft and Netscape dictate.

>As long as we have DTD-less parsing, this is the *only* option that
>will give the same parse tree.  

Not so. We could put information in the instance to differentiate mixed
content from element content.

 Paul Prescod
Received on Thursday, 12 December 1996 16:38:27 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:48 EDT