W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > December 1996

RE: RS/RE, again (sorry)

From: Jean Paoli <jeanpa@microsoft.com>
Date: Fri, 13 Dec 1996 14:02:21 -0800
Message-ID: <c=US%a=_%p=msft%l=RED-16-MSG-961213220221Z-8610@INET-05-IMC.itg.microsoft.com>
To: "'w3c-sgml-wg@w3.org'" <w3c-sgml-wg@w3.org>

>From: 	dgd@cs.bu.edu[SMTP:dgd@cs.bu.edu]
>Sent: 	Friday, December 13, 1996 3:44 PM
>To: 	w3c-sgml-wg@w3.org
>Subject: 	RE: RS/RE, again (sorry)
>Summary: A few quibbles with Jean, followed by endorsement of Tim's 3-value
>proposal -- which I was busily independently deriving as his mail arrived.
>One last quibble -- we can't forget the -xml- prefix on the values unless
>we're going to revisit the awttribute value decision too.
>   -- David
>At 10:30 AM 12/13/96, Jean Paoli wrote:
>>the problem I have with RE Delenda is the one pointed by Prescod:
>>there is no mechanism provided for having totally meaningless
>so far I'm with you.
>>It is a fact that editors or batch applications which read XML,
>>manipulate it
>>and then save it need a way to freely insert CR/LF inside the XML stream
>>in order to cut it in lines because a line has a limited number of
>This is a fact only if we make it so. We are defining the standard. XML
>already requires _not_ breaking lines in mixed content. Any editor
>shenanigams in such places will be application-visible, so the flexibility
>being gained is in fact rather small, for most documents which are composed
>mostly of mixed-content.

Look, an  application *has* to cut lines where it can and if there is a
long stream of text, it is going to cut it.
>>It need also a place to freely insert whitespaces for indentation
>>It need also to know at read-time that those characters could be removed
>>if not, the document will continue to grow indefinitely.
>In mixed content and in element content under "RE delenda est", whitespace
>can be added as long as it's added inside markup. So Editors that want to
>try to shorten lines always have an option. At issue is the stickier human
>issue of whether, and to what extent, humans need to add insignificant
>whitespace in documents.
>>This is the tribute we have to pay to enable XML document be text, which
>>human readable without tools.
>This is the issue. Given the changes in viewing software (browsers),
>editing environments (line-free editors like emacs are relatively
>commonplace), and file management (fixed-record-length filesystems are
>becoming a memory except for limited legacy contexts), the issue of
>igorable whitespace seems less critical than it did a decade ago.
>>In principle, the only safe place to insert or delete such characters is
>>in element-content
>>but element-content cannot be detected in a DTD-less environment so we
>>have a problem.
>Or in markup. In XML it's always safe to insert any combination of
>whitespace before any occurence of >.
>>I propose :
>>1/ XML Parser output: use RE Delenda Est in order to respect the data
>>integrity (and for example to permit to full-text indexers to know where
>>everything is by byte offset)
>>2/ Change the current -XML-SPACE meaning: instead of having -XML-SPACE
>>the output of the parser, let us define -XML-SPACE as information for
>>the application
>>(basically XML is an application profile of SGML, so hey, we go!): we
>>define a single value
>>-XML-SPACE=PRESERVE in order to indicate to the application that it
>>should not mess
>>*arbitrarely* with the content of the element for prettyprinting or line
>>cut purpose.
>Does this mean that the application is not allowed to change the space
>characters, or is allowed to change them? I don't understand the
>qualification "arbitrary".

this is the issue: anywhere inside a -xml-preserve, a white space is
significant so the application
is not allowed to change them for pretty printing or cutting lines (the
term significant has to be defined 
but the long stream of mail on this precise subject prove that at least
the concept is real).
any where else, the application could do whatever it decide.

>    I like the idea of being able to mark elements with mixed content as
>-xml-space="-xml-preserve". We should have an DTD automatically mark any
>elements which have element content, as -xml-space="-xml-ignore". Default
>handling of mixed content (-xml-space="-xml-collapse") would fold
>whitespace as under the current proposal.

>   All of these whitespace signals would be application notifications of
>acceptable behaviour: the parse tree would always be the same, and include
>the original whitespace, but the application would have clear notification
>of authorial intent.
>   Jean says this is less-convenient than the other proposals, but I
>disagree -- because it's definitely convenient to have a single construct
>for each behavior that you need, and the discussion shown that there are 3
>behaviors that we need.
>  -- David
>I am not a number. I am an undefined character.
>David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
>Boston University Computer Science        \  Sr. Analyst
>http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
>--------------------------------------------\  http://dynamicDiagrams.com/
>MAPA: mapping for the WWW                    \__________________________
Received on Friday, 13 December 1996 17:02:10 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:05 UTC