W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > December 1996

Re: RS/RE, again (sorry)

From: Christopher R. Maden <crm@ebt.com>
Date: Thu, 12 Dec 1996 21:08:28 GMT
Message-Id: <199612122108.VAA17549@phaser.EBT.COM>
To: w3c-sgml-wg@w3.org
[Paul Prescod]

> It isn't really the legacy issue I'm worried about. It is requiring
> every new DSSSL script to include code to detect and destroy
> whitespace that the author thought (reasonably) was only for
> pretty-printing. It is even more than that...it is the idea of
> having pretty-printing whitespace be something that is handled by
> the application *at all*. That's out of whack with most people's
> understanding of a parser's job. C++ parsers don't return
> "whitespace nodes" between tokens that the C++ "back end" must
> detect and delete.

This is already a necessity.  The 8879 rules don't eliminate any
internal whitespace, and they don't eliminate all leading and trailing
whitespace.  If I have a DSSSL stylesheet on top of an 8879 parser, I
am going to have routines to strip and compress whitespace.  RE
delenda est does not introduce a new pain here.

(I wouldn't strip whitespace if I expected the element to be
preformatted, but in that case, I'd just as soon that the parser give
me *all* of my whitespace, not *most* of it.)

> Well, that's the XML status quo. I hoped that we could do better. I
> think it will be rather annoying to have to know when I code my
> style sheets whether the application uses an XML parser or an SGML
> parser. I might just instruct authors to avoid whitespace between
> elements altogether, since it will not be reliably interpreted
> (which, I guess, is what some people want).

If every parser passed all the whitespace, then all parsers would give
the same parse tree.

As long as we have DTD-less parsing, this is the *only* option that
will give the same parse tree.  Without a DTD, everything must be
assumed to be mixed content, and with a DTD, any heuristic that
distinguishes between element and mixed content will result in a
different parse tree.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM
"<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030
<USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>
Received on Thursday, 12 December 1996 16:19:29 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:48 EDT