Re: Grosso whitespace proposal from Paul Grosso on 1996-12-17 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Grosso <paul@arbortext.com>
Date: Tue, 17 Dec 96 10:17:46 CST
To: w3c-sgml-wg@w3.org
Message-Id: <9612171617.AA14599@atiaus.arbortext.com>
> From: Paul Prescod <papresco@csclub.uwaterloo.ca>
> 
> At 12:26 PM 12/16/96 CST, you wrote:
> >What the previous paragraph does
> >seem to mean is that whatever REs XML deems to be insignificant must be
> >determined to be insignificant regardless of the content model since we
> >may not have the DTD.  For example, we can say (as I believe we have in
> >the current XML draft) that an RE immediately following a start tag is
> >insignificant thereby providing places editors and others can introduce
> >record ends while still allowing all XML processors to interpret the
> >results identically without requiring knowledge of the DTD.
> 
> Every proposal we've seen has glitches, and I just wanted to make this one's
> explicit.
> 
> a) The first may or may not be a big deal depending on your point of view,
> but that means that "well formed" XML documents cannot have a list or table
> formatted as they are typically formatted, where whitespace is introduced
> after the item/row end-tag. That might be a compromise we could live with. 

This may be true (depending on how lists or tables are "typically
formatted").  The way I would format a list or table or anything
else is:  only put blanks where I want them in my data and only
break lines immediately after start tags or immediately before end tags.

> 
> Is there a hack we could use to "escape" all of the whitespace up to the
> next tag?
> 
> <LIST>\
>      <ITEM>...</ITEM>\
>      <ITEM>...</ITEM>\
>      <ITEM>...</ITEM>\
> </LIST>

I would not want to see us employ such a hack.  The list could be formatted,
for example:

<LIST>
<ITEM>
...
</ITEM><ITEM>
...
</ITEM><ITEM>
...
</ITEM>
</LIST>

> 
> b) I'm a little uncomfortable giving users something that *looks* like what
> they are used to, but doesn't behave like it. It may well be better from a
> usability standpoint (though not a marketing one) to give them something
> that looks "funny".

(Not sure what you mean here.)

> 
> c) Who is going to check this well-formedness constraint? SGML parsers will
> happily eat the whitespace. Non-validating XML parsers will not read the DTD
> and so cannot notice whitespace in element content. We would need a new kind
> of parser: a validating XML parser *not* built on top of an SGML parser.
> (this is technically possible, but it is just more work)

I thought we had all agreed that validation required a DTD (and many of us
believe that authoring is best done in the presence of a DTD too).  With
the DTD, the validator/authoring tool can tell which whitespaces are 
insignificant and remove them.  Then, if necessary/desired, that tool
can insert REs after start tags and before end tags to break up long lines.
The resulting file would then be well-formed wrt whitespace.  Browsers
and other tools that handle XML without reference to a DTD would merely
assume the file to be well-formed and would therefore consider all whitespace
(except REs after start tags and before end tags) to be significant.
Received on Tuesday, 17 December 1996 11:30:34 UTC