Trying to sum up a bit

On this whitespace bit, we're going to have to sum up, make some sort of 
compromise, and move forward.  If I've been bad, when I die they're going to 
put me me in a room with the SGML WG for million-year arguments about RS/RE, 
with breaks every other millenium to discuss URNs and FPIs.

1. It is highly desirable that the many useful things you can do without
   DTD's (display, indexing) be done without DTD's.
2. Not having a DTD makes it impossible in principle to detect certain
   classes of cannot-possibly-be-data white space, i.e. in element content
3. Not being able to distinguish element from mixed content means that
   certain addressing constructs (e.g. those that depend on HyTime
   pseudo-elements) won't work in DTD-free mode if they take the DTD's view of 
   what is mixed & what is element content.
4. This element-content white space, along with certain other classes of white 
   space that are not data will inevitably be introduced as a result of normal 
   usage (e.g. editors' line-breaks) and users will be irritated if these are 
   considered data.

Thus, it seems like there are a very small number of alternatives:
1. Require the DTD at all times.

Pro: the problems go away.  
Con: the DTD is required at all times.

2. Work significant WS into the definition of well-formedness (Grosso)

Adopt a tight set of rules for well-formed documents such that all white 
space, except RE's after tags, may safely assumed by a DTD-less processor
to be significant.
Pro: the problem goes away
Con: 1. it's a lot harder to author XML without an XML-savvy tool
     2. you can't test well-formedness without looking at a DTD

3. All non-markup bytes are signicant, whitespace or not (Durand)

Pro: Everyone can understand the rules, it's easy to implement
Con: You lose certain Hytime addressing facilities, and the application
     gets no help from the XML processor in ignoring WS that to the user
     is "obviously" irrelevant.

4. Use an mechanism *in the instance* to signal a DTD-less application
   what's going on.

 4.1 The PI-based DTD summary (Sperberg-McQueen)
 4.2 Explicit quoting of significant character data (Goldfarb)
 4.3 -XML-SPACE
  4.3.1 -XML-SPACE with one value, PRESERVE (Paoli)
  4.3.2 -XML-SPACE with two values, PRESERVE/COLLAPSE (Current ERB)
  4.3.3 -XML-SPACE with three values, PRESERVE/COLLAPSE/SUPPRESS (Bray)
        [which we might want to rename, if what we're really doing is 
         signalling element content, mixed content, and verbatim content]
 4.4 Escaping of non-significant WS (Prescod)

Pro: Solves the problems
Con: Requires extra work from authors, possibly duplicates DTD info with
     potential for loss of sync, and tends to look ugly & unnatural

Under (4.3), there are some behavioral options:

 4.3a1. the XML processor does what -XML-SPACE says, eats the whitespace
        or not, and consumes the attribute, not passing it to the application
 4.3a2. the XML processor does what -XML-SPACE says *and* passes it on to
        the application
 4.3a3. the XML processor merely passes -XML-SPACE along to the application,
        along with all the bytes of data; it is the application's job to
        do the right thing

 4.3b1. A validating XML processor reading a DTD generates synthetic 
        -XML-SPACE attributes for the application's consumption, based on
        its knowledge of mixed/element content
 4.3b2. -XML-SPACE is only ever inserted by humans.

Face facts, folk.  There is just not a solution that is going to solve
this problem and be free of some cost.  And please remember the cost
of explanation and education is very real.  For myself, my preference 
would be (in descending order) #3, #4.3.3 (a3/b2), 4.3.2, 4.3.1.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Tuesday, 17 December 1996 15:15:55 UTC