- From: Paul Grosso <paul@arbortext.com>
- Date: Tue, 17 Dec 96 15:07:50 CST
- To: w3c-sgml-wg@w3.org
> From: Tim Bray <tbray@textuality.com> > > Thus, it seems like there are a very small number of alternatives: > 1. Require the DTD at all times. > > Pro: the problems go away. > Con: the DTD is required at all times. > > 2. Work significant WS into the definition of well-formedness (Grosso) > > Adopt a tight set of rules for well-formed documents such that all white > space, except RE's after tags, may safely assumed by a DTD-less processor > to be significant. > Pro: the problem goes away > Con: 1. it's a lot harder to author XML without an XML-savvy tool I'll grant you the point, though I would qualify "a lot." It's easy to explain: don't put in whitespace if you don't mean it. If you are authoring in vi, for example, you don't hit space or tab unless you want it in data, and you only hit return after a start tag or before an end tag. True, pretty-indented SGML probably won't be well-formed XML for most document/DTD combinations. If one of the key reasons for well-formed, DTD-less XML is to make things easier for the perl hacker or whatever, I would think that putting restrictions on where whitespace could go would make things even easier for such script writers. > 2. you can't test well-formedness without looking at a DTD True, but as an author (person or tool), you can *guarantee* (though not test for) well-formedness without a DTD. You don't insert any whitespace you don't want to be significant. To satisfy line length requirements, you introduce REs (as you deem necessary) only after start tags or before end tags. > > 3. All non-markup bytes are signicant, whitespace or not (Durand) > > Pro: Everyone can understand the rules, it's easy to implement > Con: You lose certain Hytime addressing facilities, and the application > gets no help from the XML processor in ignoring WS that to the user > is "obviously" irrelevant. Does this mean that SGML tools will necessarily lose XML-significant whitespace when reading XML, or did we come up with an SGML trick to avoid this? > > 4. Use an mechanism *in the instance* to signal a DTD-less application > what's going on. > > 4.1 The PI-based DTD summary (Sperberg-McQueen) > 4.2 Explicit quoting of significant character data (Goldfarb) > 4.3 -XML-SPACE > 4.3.1 -XML-SPACE with one value, PRESERVE (Paoli) > 4.3.2 -XML-SPACE with two values, PRESERVE/COLLAPSE (Current ERB) > 4.3.3 -XML-SPACE with three values, PRESERVE/COLLAPSE/SUPPRESS (Bray) > [which we might want to rename, if what we're really doing is > signalling element content, mixed content, and verbatim content] > 4.4 Escaping of non-significant WS (Prescod) > > Pro: Solves the problems > Con: Requires extra work from authors, possibly duplicates DTD info with > potential for loss of sync, and tends to look ugly & unnatural When I compare the cons for 4 versus the cons for 2 (sure, I'm biased), I think--even without an XML savvy tool--4 requires more work for the author and more explanation for us, the spec writers, than 2. In addition, 4 has "possibly duplicates DTD info with potential for loss of sync, and tends to look ugly & unnatural." I guess you have to decide for yourself which is uglier and more unnatural: not being able to indent/pretty-print your source sgml or having -XML-SPACE attributes throughout your source sgml.
Received on Tuesday, 17 December 1996 16:15:26 UTC