Re: RS/RE, again (sorry) from David Durand on 1996-12-12 (w3c-sgml-wg@w3.org from December 1996)

From: David Durand <dgd@cs.bu.edu>
Date: Thu, 12 Dec 1996 16:30:17 -0500 (EST)
To: w3c-sgml-wg@w3.org
Message-Id: <199612122130.QAA16210@csb.bu.edu>

	To: w3c-sgml-wg@w3.org
	From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
	Subject: Re: RS/RE, again (sorry)

	It isn't really the legacy issue I'm worried about. It is requiring every
	new DSSSL script to include code to detect and destroy whitespace that the
	author thought (reasonably) was only for pretty-printing. It is even more
	than that...it is the idea of having pretty-printing whitespace be something
	that is handled by the application *at all*. That's out of whack with most
	people's understanding of a parser's job. C++ parsers don't return
	"whitespace nodes" between tokens that the C++ "back end" must detect and
	delete.
  Yes, but such languages have explicit quoting for contexts where
spaces matter. Things are a little more tricky in a document markup
language where there is a reasonable expectation that the file is
_literal data_ to which a minimumm of additional structural
information (markup) has been added. Now the notion of "pretty
printing" is not so straightforward. But this is philosophy... and your
original comment was also philosophy.

	>> Every application that is currently based on an SGML parser will
	>> get a different parse tree from an XML parser.
	>
	>Yes.  And that's fine.

	Well, that's the XML status quo. I hoped that we could do better. I think it
	will be rather annoying to have to know when I code my style sheets whether
	the application uses an XML parser or an SGML parser. I might just instruct
	authors to avoid whitespace between elements altogether, since it will not
	be reliably interpreted (which, I guess, is what some people want).

   Or you will just write stylesheets that don't assume some whitespace
will be automatically deleted. (and leave your users alone...). The
problem remains that there is no way to tell element content from mixed
content given only an instance.

   The problem with your delimiter proposal is the same as the problem
with Charles' explicit quoting proposal -- too ugly. Worse, it's not
even easy to explain: "all tags look like this, except that if they
can't contain data, they instead this other way". And if you change a
DTD to turn element content into mixed content (or, God forbid, have a
parameter entity controlling this), you will have to change a giant mass of
delimiters in all your instances -- very unfriendly...

    -- David

Received on Thursday, 12 December 1996 16:31:03 UTC