Re: Trying to sum up a bit from Paul Grosso on 1996-12-17 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Grosso <paul@arbortext.com>
Date: Tue, 17 Dec 96 15:07:50 CST
To: w3c-sgml-wg@w3.org
Message-Id: <9612172107.AA15460@atiaus.arbortext.com>
> From: Tim Bray <tbray@textuality.com>
> 
> Thus, it seems like there are a very small number of alternatives:
> 1. Require the DTD at all times.
> 
> Pro: the problems go away.  
> Con: the DTD is required at all times.
> 
> 2. Work significant WS into the definition of well-formedness (Grosso)
> 
> Adopt a tight set of rules for well-formed documents such that all white 
> space, except RE's after tags, may safely assumed by a DTD-less processor
> to be significant.
> Pro: the problem goes away
> Con: 1. it's a lot harder to author XML without an XML-savvy tool

I'll grant you the point, though I would qualify "a lot."  It's
easy to explain:  don't put in whitespace if you don't mean it.
If you are authoring in vi, for example, you don't hit space or tab
unless you want it in data, and you only hit return after a start
tag or before an end tag.

True, pretty-indented SGML probably won't be well-formed XML for
most document/DTD combinations.

If one of the key reasons for well-formed, DTD-less XML is to make
things easier for the perl hacker or whatever, I would think that
putting restrictions on where whitespace could go would make things
even easier for such script writers.

>      2. you can't test well-formedness without looking at a DTD

True, but as an author (person or tool), you can *guarantee* (though
not test for) well-formedness without a DTD.  You don't insert any
whitespace you don't want to be significant.  To satisfy line length
requirements, you introduce REs (as you deem necessary) only after
start tags or before end tags.

> 
> 3. All non-markup bytes are signicant, whitespace or not (Durand)
> 
> Pro: Everyone can understand the rules, it's easy to implement
> Con: You lose certain Hytime addressing facilities, and the application
>      gets no help from the XML processor in ignoring WS that to the user
>      is "obviously" irrelevant.

Does this mean that SGML tools will necessarily lose XML-significant
whitespace when reading XML, or did we come up with an SGML trick
to avoid this?

> 
> 4. Use an mechanism *in the instance* to signal a DTD-less application
>    what's going on.
> 
>  4.1 The PI-based DTD summary (Sperberg-McQueen)
>  4.2 Explicit quoting of significant character data (Goldfarb)
>  4.3 -XML-SPACE
>   4.3.1 -XML-SPACE with one value, PRESERVE (Paoli)
>   4.3.2 -XML-SPACE with two values, PRESERVE/COLLAPSE (Current ERB)
>   4.3.3 -XML-SPACE with three values, PRESERVE/COLLAPSE/SUPPRESS (Bray)
>         [which we might want to rename, if what we're really doing is 
>          signalling element content, mixed content, and verbatim content]
>  4.4 Escaping of non-significant WS (Prescod)
> 
> Pro: Solves the problems
> Con: Requires extra work from authors, possibly duplicates DTD info with
>      potential for loss of sync, and tends to look ugly & unnatural

When I compare the cons for 4 versus the cons for 2 (sure, I'm biased),
I think--even without an XML savvy tool--4 requires more work for the
author and more explanation for us, the spec writers, than 2.  In 
addition, 4 has "possibly duplicates DTD info with potential for loss 
of sync, and tends to look ugly & unnatural."  I guess you have to
decide for yourself which is uglier and more unnatural:  not being
able to indent/pretty-print your source sgml or having -XML-SPACE attributes
throughout your source sgml.
Received on Tuesday, 17 December 1996 16:15:26 UTC