- From: David G. Durand <dgd@cs.bu.edu>
- Date: Fri, 13 Dec 1996 15:44:18 -0800
- To: <w3c-sgml-wg@w3.org>
Summary: A few quibbles with Jean, followed by endorsement of Tim's 3-value proposal -- which I was busily independently deriving as his mail arrived. One last quibble -- we can't forget the -xml- prefix on the values unless we're going to revisit the awttribute value decision too. -- David At 10:30 AM 12/13/96, Jean Paoli wrote: >the problem I have with RE Delenda is the one pointed by Prescod: >there is no mechanism provided for having totally meaningless >whitespaces. so far I'm with you. >It is a fact that editors or batch applications which read XML, >manipulate it >and then save it need a way to freely insert CR/LF inside the XML stream >in order to cut it in lines because a line has a limited number of >characters. This is a fact only if we make it so. We are defining the standard. XML already requires _not_ breaking lines in mixed content. Any editor shenanigams in such places will be application-visible, so the flexibility being gained is in fact rather small, for most documents which are composed mostly of mixed-content. >It need also a place to freely insert whitespaces for indentation >purpose. >It need also to know at read-time that those characters could be removed >: >if not, the document will continue to grow indefinitely. In mixed content and in element content under "RE delenda est", whitespace can be added as long as it's added inside markup. So Editors that want to try to shorten lines always have an option. At issue is the stickier human issue of whether, and to what extent, humans need to add insignificant whitespace in documents. >This is the tribute we have to pay to enable XML document be text, which >means >human readable without tools. This is the issue. Given the changes in viewing software (browsers), editing environments (line-free editors like emacs are relatively commonplace), and file management (fixed-record-length filesystems are becoming a memory except for limited legacy contexts), the issue of igorable whitespace seems less critical than it did a decade ago. >In principle, the only safe place to insert or delete such characters is >in element-content >but element-content cannot be detected in a DTD-less environment so we >have a problem. Or in markup. In XML it's always safe to insert any combination of whitespace before any occurence of >. >I propose : > >1/ XML Parser output: use RE Delenda Est in order to respect the data >integrity (and for example to permit to full-text indexers to know where >everything is by byte offset) >2/ Change the current -XML-SPACE meaning: instead of having -XML-SPACE >changing >the output of the parser, let us define -XML-SPACE as information for >the application >(basically XML is an application profile of SGML, so hey, we go!): we >define a single value >-XML-SPACE=PRESERVE in order to indicate to the application that it >should not mess >*arbitrarely* with the content of the element for prettyprinting or line >cut purpose. Does this mean that the application is not allowed to change the space characters, or is allowed to change them? I don't understand the qualification "arbitrary". I like the idea of being able to mark elements with mixed content as -xml-space="-xml-preserve". We should have an DTD automatically mark any elements which have element content, as -xml-space="-xml-ignore". Default handling of mixed content (-xml-space="-xml-collapse") would fold whitespace as under the current proposal. All of these whitespace signals would be application notifications of acceptable behaviour: the parse tree would always be the same, and include the original whitespace, but the application would have clear notification of authorial intent. Jean says this is less-convenient than the other proposals, but I disagree -- because it's definitely convenient to have a single construct for each behavior that you need, and the discussion shown that there are 3 behaviors that we need. -- David I am not a number. I am an undefined character. _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Friday, 13 December 1996 15:45:45 UTC