Re: Grosso whitespace proposal from Paul Prescod on 1996-12-17 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Tue, 17 Dec 1996 13:40:25 -0500
To: paul@arbortext.com (Paul Grosso), w3c-sgml-wg@w3.org
Message-Id: <1.5.4.32.19961217184025.0098fad8@csclub.uwaterloo.ca>
At 10:17 AM 12/17/96 CST, Paul Grosso wrote:
>> From: Paul Prescod <papresco@csclub.uwaterloo.ca>
>> 
>> a) The first may or may not be a big deal depending on your point of view,
>> but that means that "well formed" XML documents cannot have a list or table
>> formatted as they are typically formatted, where whitespace is introduced
>> after the item/row end-tag. That might be a compromise we could live with. 
>
>This may be true (depending on how lists or tables are "typically
>formatted").  The way I would format a list or table or anything
>else is:  only put blanks where I want them in my data and only
>break lines immediately after start tags or immediately before end tags.

Do you mean that is how you would format a list in the new XML proposal, or
that is how you would "normally" format lists, (which I presume you don't
normally do by hand anyway)? All I meant in the poing above is that most
documents on the Web seem to use this convention:

<TABLE>
<TR>...</TR>
<TR>...</TR>
<TR>...</TR>
</TABLE>

which is perfectly valid in SGML, and seems quite readable and reasonable to
me. As I said, we could live without it, but should at least be aware that
we are losing something.
 
>> Is there a hack we could use to "escape" all of the whitespace up to the
>> next tag?
>> 
>> <LIST>\
>>      <ITEM>...</ITEM>\
>>      <ITEM>...</ITEM>\
>>      <ITEM>...</ITEM>\
>> </LIST>
>
>I would not want to see us employ such a hack.  The list could be formatted,
>for example:

I tend to think the "hack" gives more freedom. Note that I mean "hack" in
the sense of "SGML Hack" (like the other SGML declaration tricks we are
using) and not hack from the end-users point of view. I think that such an
escape mechanism, if it is possible would give typists freedom to format as
they like and implementors a simple mechanism for distinguishing
data-whitespace from formatting-whitespace. It seems like a good compromise
to me: "you can continue to markup up as you
have been, just put this character in before whitespace that you don't want
displayed."

Anyhow, there is probably no SGML way to do it, so it isn't worth worrying
about.

>> b) I'm a little uncomfortable giving users something that *looks* like what
>> they are used to, but doesn't behave like it. It may well be better from a
>> usability standpoint (though not a marketing one) to give them something
>> that looks "funny".
>
>(Not sure what you mean here.)

I mean that documents created according to your proposal would visually look
as if they were following the SGML/HTML rules, but when you put whitespace
in the "wrong" place, you would get an error message (or worse, different
parse trees). Sometimes it is better to be obvious when you are doing
something different than to change things subtly in this way.
 
>> c) Who is going to check this well-formedness constraint? SGML parsers will
>> happily eat the whitespace. Non-validating XML parsers will not read the DTD
>> and so cannot notice whitespace in element content. We would need a new kind
>> of parser: a validating XML parser *not* built on top of an SGML parser.
>> (this is technically possible, but it is just more work)
>
>I thought we had all agreed that validation required a DTD (and many of us
>believe that authoring is best done in the presence of a DTD too).  With
>the DTD, the validator/authoring tool can tell which whitespaces are 
>insignificant and remove them.  Then, if necessary/desired, that tool
>can insert REs after start tags and before end tags to break up long lines.
>The resulting file would then be well-formed wrt whitespace.  Browsers
>and other tools that handle XML without reference to a DTD would merely
>assume the file to be well-formed and would therefore consider all whitespace
>(except REs after start tags and before end tags) to be significant.

We're talking about different things. You specified that well-formed XML
documents should not have whitespace between elements in element content. If
this is a constraint on the documents, somebody is going to have to write a
checker for it, or nobody will know if their documents are right or not.
This checker cannot "check" the output of an SGML parser, because the SGML
parser will already have removed the whitespace. The checker cannot use a
non-validating XML parser, because without the DTD it won't know which
whitespace is in the wrong place. 

So we need a new kind of parser for any apps that are going to load and
validate XML documents. Anyone who wants to write a complete "validating XML
parser" will have to modify an SGML parser, not just build on top of it.

On the other hand, the "empty tag hack" may already require this. The
penalty for messing up the validation of empty tag closes is going to be an
obvious browser choke ("where's the end of this tag??"). The penalty for
messing up the whitespace handling will be documents that are displayed
subtly differently than you intended them. (it depends on your point of view
which you consider worse...).

As I said, these problems are fairly minor compared to the problems or
inconveniences with the other proposals.

 Paul Prescod
Received on Tuesday, 17 December 1996 13:37:44 UTC