- From: Adam M. Costello <amc@cs.berkeley.edu>
- Date: Sun, 10 Aug 1997 14:52:16 -0700 (PDT)
- To: www-html-editor@w3.org
On 29 Jul I sent a bunch of comments, including this one:
http://www.w3.org/TR/WD-html40/struct/text.html
Thank you for the white space section! I've been wondering about
how white space is treated in HTML for a long time.
A line break occurring immediately following a start tag should
be discarded, as should a line break occurring immediately
before an end tag. This applies to all HTML elements without
exceptions. In addition, for all elements except PRE, a sequence
of contiguous white space characters such as spaces, horizontal
tabs, form feeds and line breaks, should be replaced by a single
word space.
This is somewhat ambiguous. If a start tag is immediately followed
by a line break and then some white space, should all the white
space be discarded with the line break? Or should only the line
break be discarded, and the remaining white space collapsed to a
single word space? My first guess based on the above paragraph was
that only the line break gets discarded, but the examples suggest
otherwise (which would be preferrable, I think).
Since then I have thought of additional concerns. What if a start tag
is immediately followed by space-newline? What if an element ends with
a newline, but its end tag is omitted?
I think the rules should refer to the start and end of an element, not
to tags. I'm not sure it's a good idea to distinguish newlines from
other white space characters, since they've been indistinguishable until
now (I think). Also note that it's easy for spaces preceeding a newline
to go unnoticed by humans.
Here is a possible set of rules that does not distinguish newlines from
spaces:
First, every maximal sequence of white space characters is replaced
by a single space.
Second, any space at the beginning or end of an element is deleted.
Here is a possible set of rules that does distinguish newlines from
spaces:
First, every maximal sequence of white space characters is replaced
by either a single space (if it originally contained no newlines) or
by a single newline (if it originally contained any newlines).
Second, any newline at the beginning or end of an element is
deleted.
AMC
Received on Sunday, 10 August 1997 17:52:22 UTC