RS/RE: basic questions

Here are two questions, both stated as challenges to assumptions that have
been stated here but never defended.  I am not necessarily saying that
the assumptions fail, but that to me at least, their validity is not
self-evident and that they need some reasoned rhetoric in their support.

1. Why ignore whitespace?

Given really simplistic RS/RE handling, it would be the case that the two 
following "P" elements would parse differently.

<p>Listen to my heart beat.</p>
<p>
Listen to my heart beat.
</p>

The position has been advanced by both Charles Goldfarb and James Clark
that this would be A Bad Thing.  Obviously, it would complicate the
problem of achieving compatibility with 8879.  Aside from that, 
WHY IS THIS A PROBLEM?

It makes it idiotically, wonderfully, easy to explain to programmers 
*and authors* exactly what is markup and what is data.  It makes it 
ridiculously easy to implement.  The downside is that it makes it makes
XML/SGML source code slightly less readable.  Do that, and the 8879
issue, constitute all the downside?  (Not saying those aren't real; just 
wondering if there are more problems)

2. Why should XML try to solve the record problem.?

Personally, I've been writing programs for almost 20 years which routinely
dealt with the fact that there might be NL or CR or CR/NL sequences in the
data, and maybe my experience is not shared, but this has never been a big
problem.  Is it necessary for XML to abstract the problem away in the way
that SGML tries to do, especially if it's going to be hard to do in XML?

In fact, the practice in UNIX and Microsoft operating systems of storing
text in chunks of 80 bytes or less, separated by artefacts of typewriter
technology, is simply a historical anomaly, and I'm not sure that we should
pander to it, particularly when (and here's the real challenge I guess) it
doesn't seem, in practice, to be a big problem.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Thursday, 19 September 1996 16:33:32 UTC