- From: Adam M. Costello <amc@cs.berkeley.edu>
- Date: Sun, 10 Aug 1997 14:52:16 -0700 (PDT)
- To: www-html-editor@w3.org
On 29 Jul I sent a bunch of comments, including this one: http://www.w3.org/TR/WD-html40/struct/text.html Thank you for the white space section! I've been wondering about how white space is treated in HTML for a long time. A line break occurring immediately following a start tag should be discarded, as should a line break occurring immediately before an end tag. This applies to all HTML elements without exceptions. In addition, for all elements except PRE, a sequence of contiguous white space characters such as spaces, horizontal tabs, form feeds and line breaks, should be replaced by a single word space. This is somewhat ambiguous. If a start tag is immediately followed by a line break and then some white space, should all the white space be discarded with the line break? Or should only the line break be discarded, and the remaining white space collapsed to a single word space? My first guess based on the above paragraph was that only the line break gets discarded, but the examples suggest otherwise (which would be preferrable, I think). Since then I have thought of additional concerns. What if a start tag is immediately followed by space-newline? What if an element ends with a newline, but its end tag is omitted? I think the rules should refer to the start and end of an element, not to tags. I'm not sure it's a good idea to distinguish newlines from other white space characters, since they've been indistinguishable until now (I think). Also note that it's easy for spaces preceeding a newline to go unnoticed by humans. Here is a possible set of rules that does not distinguish newlines from spaces: First, every maximal sequence of white space characters is replaced by a single space. Second, any space at the beginning or end of an element is deleted. Here is a possible set of rules that does distinguish newlines from spaces: First, every maximal sequence of white space characters is replaced by either a single space (if it originally contained no newlines) or by a single newline (if it originally contained any newlines). Second, any newline at the beginning or end of an element is deleted. AMC
Received on Sunday, 10 August 1997 17:52:22 UTC