- From: David G. Durand <dgd@cs.bu.edu>
- Date: Wed, 18 Dec 1996 15:25:38 -0800
- To: w3c-sgml-wg@w3.org
Summary: DTDs required for hyperlinking is an absolute anti-requirement. Another whitespace proposal, but this one keeps same parse trees and no unnatural markup. It may not preserve equivalence of application behavior across DTD-containing, and DTD-omitted instances, but can allows hyperlinking and indexing and database-like applications to work properly in all cases (and these are the examples we have to work with). Formatting (oddly) is the one that gets left in the cold because signficance of WS cannot be determined without a DTD. >I think this issue needs to be put on hold until hyperlinking >mechanisms have been defined. If the specified mechanisms don't >change based on whitespace, this is irrelevant. However, I have a >strong suspicion that any application with hyperlinking capabilities >will need to read the DTD. This is _not_ acceptable for HTTP delivery. We can answer that question now, because web delivery and browsing is one of the reasons DTD optionality was put in. I don't want to read any TEI/XML documents on the web if it isn't there. And linking cannot be optional if XML is to have a prayer of success. We must be able to have a fully-function Internetscape-explorer equivalent based on XML, without required DTDs. Otherwise people will just laugh at our "solution" to their problems. At 3:05 PM 12/18/96, Christopher R. Maden wrote: >Not any more, I'm afraid. If whitespace is data, there can be no >element content, and if it's SEPCHAR, then it'll be eaten by a >conforming parser. And we will just have to learn to live with that. I already have. Since an XML parser without a DTD can't tell if something is element content, we _can't_ be SGML compatible on this issue. An idea (yes, yet another permutation): All WS is passed through. All applications will be notified of all whiotespace, but SGML-based applications will be notified that they are "insignificant to the author". The parser (or grove transformer, or whatever) aill look at the DTD, and determine that these #PCDATA nodes contain only whitespace and are in element content, so they are "insignificant". The only case that anyone has shown where this ignoration is a significant problem is in formatting, and we can define that style-sheet input never contains insignificant whitespace unless it is explicitly requested. Validating XML parsers (which have a DTD) are _required_ to check all pseudo-elements in element content, and determine that they contain only-WS characters. They are free not to format differently based on those characters, but are required to count them (same as SGML is above). This is _not_ compatible, but seems to work properly for any case I can think of. It seems that we really only need to require of XML validation that it ensure that only WS characters occur in element content (when it can tell). And of SGML-based validators, that they be aware of the presence of whitespace in element content (but they need never process it, and they can always recognize it because of the DTD). So we could, without unnatural markup, produce the same parse tree, and process it correctly. If DTD-free XML formatters start reacting badly to whitespace, people will start leaving it out (many bets are off in such cases anyway). In all cases, however, the parse tree will be the same (and include all whitespace), and in all cases, DTD-posessing applications will know what characters are significant (but are required to make insignificant characters available to applications that care (like indexers and link-engines)). -- David I am not a number. I am an undefined character. _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Wednesday, 18 December 1996 15:25:56 UTC