W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > December 1996

RE: RS/RE, again (sorry)

From: David G. Durand <dgd@cs.bu.edu>
Date: Fri, 13 Dec 1996 15:44:18 -0800
Message-Id: <v02130500aed72328d3a4@[165.90.139.104]>
To: <w3c-sgml-wg@w3.org>
Summary: A few quibbles with Jean, followed by endorsement of Tim's 3-value
proposal -- which I was busily independently deriving as his mail arrived.
One last quibble -- we can't forget the -xml- prefix on the values unless
we're going to revisit the awttribute value decision too.

   -- David


At 10:30 AM 12/13/96, Jean Paoli wrote:
>the problem I have with RE Delenda is the one pointed by Prescod:
>there is no mechanism provided for having totally meaningless
>whitespaces.

so far I'm with you.

>It is a fact that editors or batch applications which read XML,
>manipulate it
>and then save it need a way to freely insert CR/LF inside the XML stream
>in order to cut it in lines because a line has a limited number of
>characters.

This is a fact only if we make it so. We are defining the standard. XML
already requires _not_ breaking lines in mixed content. Any editor
shenanigams in such places will be application-visible, so the flexibility
being gained is in fact rather small, for most documents which are composed
mostly of mixed-content.

>It need also a place to freely insert whitespaces for indentation
>purpose.
>It need also to know at read-time that those characters could be removed
>:
>if not, the document will continue to grow indefinitely.

In mixed content and in element content under "RE delenda est", whitespace
can be added as long as it's added inside markup. So Editors that want to
try to shorten lines always have an option. At issue is the stickier human
issue of whether, and to what extent, humans need to add insignificant
whitespace in documents.

>This is the tribute we have to pay to enable XML document be text, which
>means
>human readable without tools.
This is the issue. Given the changes in viewing software (browsers),
editing environments (line-free editors like emacs are relatively
commonplace), and file management (fixed-record-length filesystems are
becoming a memory except for limited legacy contexts), the issue of
igorable whitespace seems less critical than it did a decade ago.

>In principle, the only safe place to insert or delete such characters is
>in element-content
>but element-content cannot be detected in a DTD-less environment so we
>have a problem.

Or in markup. In XML it's always safe to insert any combination of
whitespace before any occurence of >.

>I propose :
>
>1/ XML Parser output: use RE Delenda Est in order to respect the data
>integrity (and for example to permit to full-text indexers to know where
>everything is by byte offset)
>2/ Change the current -XML-SPACE meaning: instead of having -XML-SPACE
>changing
>the output of the parser, let us define -XML-SPACE as information for
>the application
>(basically XML is an application profile of SGML, so hey, we go!): we
>define a single value
>-XML-SPACE=PRESERVE in order to indicate to the application that it
>should not mess
>*arbitrarely* with the content of the element for prettyprinting or line
>cut purpose.

Does this mean that the application is not allowed to change the space
characters, or is allowed to change them? I don't understand the
qualification "arbitrary".

    I like the idea of being able to mark elements with mixed content as
-xml-space="-xml-preserve". We should have an DTD automatically mark any
elements which have element content, as -xml-space="-xml-ignore". Default
handling of mixed content (-xml-space="-xml-collapse") would fold
whitespace as under the current proposal.

   All of these whitespace signals would be application notifications of
acceptable behaviour: the parse tree would always be the same, and include
the original whitespace, but the application would have clear notification
of authorial intent.

   Jean says this is less-convenient than the other proposals, but I
disagree -- because it's definitely convenient to have a single construct
for each behavior that you need, and the discussion shown that there are 3
behaviors that we need.

  -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Friday, 13 December 1996 15:45:45 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:48 EDT