Another proposal, and a requirement (was Re: Trying to sum up a bit)

Summary: DTDs required for hyperlinking is an absolute anti-requirement.
         Another whitespace proposal, but this one keeps same parse trees
and no unnatural markup. It may not preserve equivalence of application
behavior across DTD-containing, and DTD-omitted instances, but can allows
hyperlinking and indexing and database-like applications to work properly
in all cases (and these are the examples we have to work with). Formatting
(oddly) is the one that gets left in the cold because signficance of WS
cannot be determined without a DTD.


>I think this issue needs to be put on hold until hyperlinking
>mechanisms have been defined.  If the specified mechanisms don't
>change based on whitespace, this is irrelevant.  However, I have a
>strong suspicion that any application with hyperlinking capabilities
>will need to read the DTD.

This is _not_ acceptable for HTTP delivery. We can answer that question
now, because web delivery and browsing is one of the reasons DTD
optionality was put in. I don't want to read any TEI/XML documents on the
web if it isn't there. And linking cannot be optional if XML is to have a
prayer of success. We must be able to have a fully-function
Internetscape-explorer equivalent based on XML, without required DTDs.
Otherwise people will just laugh at our "solution" to their problems.

At 3:05 PM 12/18/96, Christopher R. Maden wrote:
>Not any more, I'm afraid.  If whitespace is data, there can be no
>element content, and if it's SEPCHAR, then it'll be eaten by a
>conforming parser.

And we will just have to learn to live with that. I already have. Since an
XML parser without a DTD can't tell if something is element content, we
_can't_ be SGML compatible on this issue.

   An idea (yes, yet another permutation):

   All WS is passed through.

   All applications will be notified of all whiotespace, but SGML-based
applications will be notified that they are "insignificant to the author".
The parser (or grove transformer, or whatever) aill look at the DTD, and
determine that these #PCDATA nodes contain only whitespace and are in
element content, so they are "insignificant". The only case that anyone has
shown where this ignoration is a significant problem is in formatting, and
we can define that style-sheet input never contains insignificant
whitespace unless it is explicitly requested.

   Validating XML parsers (which have a DTD) are _required_ to check all
pseudo-elements in element content, and determine that they contain only-WS
characters. They are free not to format differently based on those
characters, but are required to count them (same as SGML is above).

This is _not_ compatible, but seems to work properly for any case I can
think of.

   It seems that we really only need to require of XML validation that it
ensure that only WS characters occur in element content (when it can tell).
And of SGML-based validators, that they be aware of the presence of
whitespace in element content (but they need never process it, and they can
always recognize it because of the DTD).

So we could, without unnatural markup, produce the same parse tree, and
process it correctly. If DTD-free XML formatters start reacting badly to
whitespace, people will start leaving it out (many bets are off in such
cases anyway). In all cases, however, the parse tree will be the same (and
include all whitespace), and in all cases, DTD-posessing applications will
know what characters are significant (but are required to make
insignificant characters available to applications that care (like indexers
and link-engines)).
   -- David


I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________

Received on Wednesday, 18 December 1996 15:25:56 UTC