Re: RS/RE: basic questions from Robert Streich on 1996-09-22 (w3c-sgml-wg@w3.org from September 1996)

From: Robert Streich <streich@slb.com>
Date: Sun, 22 Sep 96 15:29:03 CDT
To: w3c-sgml-wg@w3.org
Message-Id: <9609222029.AA01109@austin.asc.slb.com>
At 10:53 AM 9/20/96 CDT, Michael Sperberg-McQueen wrote:
>Or even
>
>     <p>Listen to my heart beat.
>     <?DIRECTOR: audio on
>     >And beat and beat and beat.</p>

This is an excellent technique. DynaTag uses it to great success to get
around mixed content problems in the really screwy DTDs that it creates.

At 06:08 AM 9/21/96 +0000, Tim Bray wrote:
>And I agree 100%.  I am proposing (perhaps only as a strawman) that in XML we 
>make it 100% crystal clear that in in case [2], the "true information" is, in 
>C notation, 
>   "\nListen to my heart beat\n\nand beat and beat\n"
>or if you typed it in on a windows box
>   "\r\nListen to my heart beat\r\n\r\nand beat and beat\r\n"

I agree, sort of. If possible, I'd like to have the first and last
discarded. These should be easy to differentiate and discard and are
"logically" insignificant.

>No ambiguity whatsoever.  The costs are: 
>[a] difficulty in figuring out how to make this 8879-compliant, and 

This is tricky (if not impossible) and I could accept a divergence here
as it's impact is very, very small.

>[b] the fact that you can no longer use whitespace around markup if you are 
>    worried about the application's handling of line breaks in the data, so
you 
>    might in fact have to use
>
>    <p>Listen to my heart beat<? Director Audio on> and beat and beat.</p>
>
>    which is harder to read, and hard to type in vi, for long paragraphs.

I can easily live with this as this is the current situation. Applications
decide what a newline sequence means based on the stylesheet. In 
Author/Editor and DynaText they look like spaces unless I specify
"verbatim" formatting in the stylesheet.

>The advantages are: 
>[a] you can explain quickly, clearly, and precisely exactly what the "true  
>    information" is
>[b] Implementation is very easy

I think the advantages far outweigh the disadvantages and it is easy to
explain to the author.

The only place it becomes an issue is when you have markup in data content
that is not a proper subelement and it's the only thing on the line. Worst
case: if line breaks are significant (to the presentation), you get an
empty new line; if line breaks are not significant, you get an extra space.

These things are easily picked up during proofing. Good spellcheckers even
pick up the two or more spaces for you. If I'm in line-break-significant
content, I'm also very unlikely to put in any extraneous markup anyway.
If I need to, I can easily remember to "hide" the line break in the added
markup using either Lee's or Michael's suggestions.

Another advantage is that it makes it very easy for an SGML editor to
"save as XML," at least in this case.

The biggest disadvantage is that in the probably very, very few cases where
someone wanted the extraneous line-break after some markup, an SGML parser
would discard it. I can live with this risk.

At 08:21 PM 9/21/96 GMT, Charles F. Goldfarb wrote:
>Yes, but then you are requiring the author to enforce the rules as well as
>remember them. With smart record handling in the parser, the author only has to
>remember the rules; the parser enforces them.

But I think the rules are much simpler to remember and a lot easier to
digest than having to sometimes "quote" data content. This requires that
the author know what mixed content is and which elements are mixed. This
is a lot more to bite off than the alternative.

bob


Robert Streich				streich@slb.com
Schlumberger				voice: 1 512 331 3318
Austin Research				fax:   1 512 331 3760
Received on Sunday, 22 September 1996 16:29:25 UTC