[Prev][Next][Index][Thread]

Re: RS/RE: basic questions



At 11:07 AM 9/23/96 +0000, James Clark wrote:
>At 01:50 23/09/96 CDT, Robert Streich wrote:
>>I think they are pretty straightforward:
>>
>> In data content:
>>  1. If an element begins or ends with a newline [not entirely
>>     accurate, but this is what people see], the newline is ignored.
>>  2. Newlines inside markup are ignored.
>>  3. All other newlines are passed on.
>
>According to your statement of the rules all of the three newlines in the
>following would be passed on:
>
><doc><?>
><?>
>data
><?></doc>
>
>In fact none of them are.

Exactly right on both counts. I'm willing to accept this inconsistency
with SGML. An XML/SGML editor could write it out such that it would
parse the same in SGML, i.e.,

<doc><?><?>data<?></doc>

or

<doc
><?
><?
>data<?
></doc>

Someone who was hand coding the doc would have to know how to conceal the
newlines if they were in an element where newlines were considered
"significant." But then, I'm inclined to believe that they would expect
them in a case like this. If the line endings weren't considered
significant, I'd expect a browser to discard them just as it would
any other leading or trailing whitespace.

With this approach, the only place you run into a problem is when you
have significant line breaks. I'd rather have this then get rid of
them altogether. Leave it up to the application to deal with the
whitespace. If we don't have shortrefs then it's no longer important
for the parser to care about what whitespace character it is. Tabs,
newlines, carriage-returns, spaces, all get treated the same in data
content--passed through.

bob

Robert Streich				streich@slb.com
Schlumberger				voice: 1 512 331 3318
Austin Research				fax:   1 512 331 3760