- From: David Durand <dgd@cs.bu.edu>
- Date: Thu, 12 Dec 1996 16:30:17 -0500 (EST)
- To: w3c-sgml-wg@w3.org
To: w3c-sgml-wg@w3.org From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca> Subject: Re: RS/RE, again (sorry) It isn't really the legacy issue I'm worried about. It is requiring every new DSSSL script to include code to detect and destroy whitespace that the author thought (reasonably) was only for pretty-printing. It is even more than that...it is the idea of having pretty-printing whitespace be something that is handled by the application *at all*. That's out of whack with most people's understanding of a parser's job. C++ parsers don't return "whitespace nodes" between tokens that the C++ "back end" must detect and delete. Yes, but such languages have explicit quoting for contexts where spaces matter. Things are a little more tricky in a document markup language where there is a reasonable expectation that the file is _literal data_ to which a minimumm of additional structural information (markup) has been added. Now the notion of "pretty printing" is not so straightforward. But this is philosophy... and your original comment was also philosophy. >> Every application that is currently based on an SGML parser will >> get a different parse tree from an XML parser. > >Yes. And that's fine. Well, that's the XML status quo. I hoped that we could do better. I think it will be rather annoying to have to know when I code my style sheets whether the application uses an XML parser or an SGML parser. I might just instruct authors to avoid whitespace between elements altogether, since it will not be reliably interpreted (which, I guess, is what some people want). Or you will just write stylesheets that don't assume some whitespace will be automatically deleted. (and leave your users alone...). The problem remains that there is no way to tell element content from mixed content given only an instance. The problem with your delimiter proposal is the same as the problem with Charles' explicit quoting proposal -- too ugly. Worse, it's not even easy to explain: "all tags look like this, except that if they can't contain data, they instead this other way". And if you change a DTD to turn element content into mixed content (or, God forbid, have a parameter entity controlling this), you will have to change a giant mass of delimiters in all your instances -- very unfriendly... -- David
Received on Thursday, 12 December 1996 16:31:03 UTC