- From: David G. Durand <dgd@cs.bu.edu>
- Date: Sun, 15 Dec 1996 14:11:39 -0500
- To: w3c-sgml-wg@w3.org
Summary: I think we are arguing at cross purposes, because we are not being explicit enough about requirements and strategies. This message attempts to summarize key points, and does not advocate (or intend to) slant towards any conclusion. One of the reasons that Jean's posting has created confusion is that he is intorducing a new requirement: easy of whitespace insertion for editors to break long "lines". I'm just going to try liting the requirements as I think they are: if we can get a list that we can agree on, maybe we can grease ourselves up and get detached from this tar-baby. 1. Almost everyone desires that the parse tree be identical (or as similar as possible) when parsing with and without a DTD. 2. Some people believe very strongly that the ignoring of whitespace in elment content is important because: a) current SGML editors indent in element content mode. b) human readers of documents find whitespace around tags useful for readability. c) stylesheets and applications should no be made responsible for removing whitespace, as they will then not handle things consistently. 3. Jean Paoli has identified a need for future editing software to insert linebreaks as desired (This might, in some cases even occur in mixed content). 4. Most people seem to agree that mixed content should preserve almost all whitespace. There is now even an SGML-compatible way to preserve intial RE in mixed content. Some factors constrain solutions: A. SGML, as defined, ignores whitespace in element content. B. XML, when parsing without a DTD, cannot detect element content reliably. C. SGML does not normally preserve initial RE in mixed content. Some possible techniques may address some of these factors in reaching some of the goals. 1. Charles' shortref RE hack can disable C. without affecting anything else. This would allow SGML-compatible application processing of all whitespace in _mixed_ content. 2. Explicit flags of some sort on elements could signal to an application that it should apply a particular whitespace strategy. Thus, we could pass whitespace to the application, and make the author responsible for marking whether or not they are to be significant. 3. The application could set these flags automatically, in some cases, when parsing with a DTD (element content automatically -XML-SPACE=IGNORE, unless the _author_ requests otherwise, for instance). 4. We could take the Perl approach and have the parser guess and set any missing WS flags. FOr instance, it would set the -XML-SPACE=IGNORE flag for elements containing only WS and other subelements. It could set the -XML-SPACE=COLLAPSE flag for elements containing non-space characters, and (optionally) other markup. The drawback would be complicated rules for the default processing, that can at least be defeated by explicitly tagging elements. This also leads to more-complex implementations. 5. We could take a strict RE delenda est approach: needing no flags, but also only satisfying requirements 1 and 4. 6. Since we can put whitespace before TAGC in any of these proposals. This may satisfy requirement 3. Many feel that this technique utterly fails to satisfy requirements 2a, and 2b, because editors don't work that way now, and the syntax is ugly, respectively. ] [ Ed: Personally I think this technoque completely solves requirement 3 without any work by us at all. It is trivial to implement, and also works in mixed content, providing a better solution to the algorithmic problem.] There are probably missing points and misrepresentations, but can we try to get a list like this finalized before we continue arguing specific proposals? If you like this idea, and want to make a correction, edit the list and re-post it, so that we can converge on a single list without too much pain. -- David I am not a number. I am an undefined character. _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Sunday, 15 December 1996 14:05:07 UTC