Re: RS/RE considered confusing?

At 11:41 AM 9/17/96 +0000, James Clark wrote:
>a. An approch similar to HTML.  In HTML (except in PRE), line
>terminators are just white-space, adjacent white-space is collapsed
>and leading/trailing whitespace in an element is stripped.  The
>interesting thing about this is that if you perform this process it
>doesn't matter whether or not you ignored the REs that SGML said you
>should: you'll end up with the same result.  Well, almost: you'll
>still have the problems with REs getting moved past inclusions and
>PIs.  The main problem with this is how to handle verbatim type
>elements.  Can we live without these?

Only if you want to rule out most technical documents. Even the original
HTML DTD had three different tags for text with significant line endings:
XMP, PRE, and LISTING. Code samples, sample terminal dialogs, etc. all
contain significant line endings as would some notation content.

One option would be to say that all content with significant line endings
be put into CDATA entities and given a notation. (We haven't yet touched
on how external entities are going to be handled or what the different types
will be.) The biggest drawback to this is that you force some messy management
onto the author who has to cope with all the little pieces.

Another option would be to force people to create "line" or "line-ending"
elements, e.g.,

  <line>[lots of junk]</line>


  main() {<br></br>         [yecchh, a non-empty element to identify a
  [lots of junk]<br></br>    significant line-break!]

but this forces authors to put a lot of markup around things that they
really should be able to just copy into place.

Robert Streich				streich@slb.com
Schlumberger				voice: 1 512 331 3318
Austin Research				fax:   1 512 331 3760