Re: comments on HTML 4.0 draft

On 29 Jul I sent a bunch of comments, including this one:

http://www.w3.org/TR/WD-html40/struct/text.html

    Thank you for the white space section!  I've been wondering about
    how white space is treated in HTML for a long time.

        A line break occurring immediately following a start tag should
        be discarded, as should a line break occurring immediately
        before an end tag. This applies to all HTML elements without
        exceptions. In addition, for all elements except PRE, a sequence
        of contiguous white space characters such as spaces, horizontal
        tabs, form feeds and line breaks, should be replaced by a single
        word space.

    This is somewhat ambiguous.  If a start tag is immediately followed
    by a line break and then some white space, should all the white
    space be discarded with the line break?  Or should only the line
    break be discarded, and the remaining white space collapsed to a
    single word space?  My first guess based on the above paragraph was
    that only the line break gets discarded, but the examples suggest
    otherwise (which would be preferrable, I think).

Since then I have thought of additional concerns.  What if a start tag
is immediately followed by space-newline?  What if an element ends with
a newline, but its end tag is omitted?

I think the rules should refer to the start and end of an element, not
to tags.  I'm not sure it's a good idea to distinguish newlines from
other white space characters, since they've been indistinguishable until
now (I think).  Also note that it's easy for spaces preceeding a newline
to go unnoticed by humans.

Here is a possible set of rules that does not distinguish newlines from
spaces:

    First, every maximal sequence of white space characters is replaced
    by a single space.

    Second, any space at the beginning or end of an element is deleted.

Here is a possible set of rules that does distinguish newlines from
spaces:

    First, every maximal sequence of white space characters is replaced
    by either a single space (if it originally contained no newlines) or
    by a single newline (if it originally contained any newlines).

    Second, any newline at the beginning or end of an element is
    deleted.

AMC

Received on Sunday, 10 August 1997 17:52:22 UTC