Re: RS/RE, again (sorry)
From: Paul Prescod <email@example.com>
Subject: Re: RS/RE, again (sorry)
It isn't really the legacy issue I'm worried about. It is requiring every
new DSSSL script to include code to detect and destroy whitespace that the
author thought (reasonably) was only for pretty-printing. It is even more
than that...it is the idea of having pretty-printing whitespace be something
that is handled by the application *at all*. That's out of whack with most
people's understanding of a parser's job. C++ parsers don't return
"whitespace nodes" between tokens that the C++ "back end" must detect and
Yes, but such languages have explicit quoting for contexts where
spaces matter. Things are a little more tricky in a document markup
language where there is a reasonable expectation that the file is
_literal data_ to which a minimumm of additional structural
information (markup) has been added. Now the notion of "pretty
printing" is not so straightforward. But this is philosophy... and your
original comment was also philosophy.
>> Every application that is currently based on an SGML parser will
>> get a different parse tree from an XML parser.
>Yes. And that's fine.
Well, that's the XML status quo. I hoped that we could do better. I think it
will be rather annoying to have to know when I code my style sheets whether
the application uses an XML parser or an SGML parser. I might just instruct
authors to avoid whitespace between elements altogether, since it will not
be reliably interpreted (which, I guess, is what some people want).
Or you will just write stylesheets that don't assume some whitespace
will be automatically deleted. (and leave your users alone...). The
problem remains that there is no way to tell element content from mixed
content given only an instance.
The problem with your delimiter proposal is the same as the problem
with Charles' explicit quoting proposal -- too ugly. Worse, it's not
even easy to explain: "all tags look like this, except that if they
can't contain data, they instead this other way". And if you change a
DTD to turn element content into mixed content (or, God forbid, have a
parameter entity controlling this), you will have to change a giant mass of
delimiters in all your instances -- very unfriendly...