Re: Collapsing breaks & non-beaking spaces.

Galactus sent this to the www-html list:

>Strangely enough, I can't find anything in RFC 1866 that explicitly
>states that multiple spaces *are* collapsed, only that a newline is
>a word space.

Multiple spaces do not collapse.  All spaces are ignored and then
reconstituted when the document is displayed.

>RFC1866:
>6. Characters, Words, and Paragraphs
>
>   An HTML user agent should present the body of an HTML document as a
>   collection of typeset paragraphs and preformatted text.  Except for
>   preformatted elements (<PRE>, <XMP>, <LISTING>, <TEXTAREA>), each
>   block structuring element is regarded as a paragraph by taking the
>   data characters in its content and the content of its descendant
>   elements, concatenating them, and splitting the result into words,
>   separated by space, tab, or record end characters (and perhaps hyphen
>   characters). The sequence of words is typeset as a paragraph by
>   breaking it into lines.

So UAs are supposed to take a whole block at once, consider any inline
elements, and then sieve it into distinct words which are typeset for
viewing conditions, separated by the appropriate whitespace character.
In the process of determining what makes up a word, any characters which
are defined as whitespace are ignored by the browser.

Since &nbsp; isn't on the (pre-RFC1866? ISO-8859? SGML?) list of
whitespaces, it is always considered part of a word.  That's how they
work, by displaying a "space" glyph that hides from tokenization - a
display hack in itself, even when used traditionally.

Multiple &nbsp; can't collapse like other whitespaces without losing
their non-breaking nature.

Moreover, I don't think they can be entirely replaced by stylesheet
functions for display purposes. The most often cited "abuse" - indenting
first lines of paragraphs - can easily be specified in CSS, but other
uses such as showing double space at the end of a sentence or visually
setting off a single word within a paragraph cannot be so easily
achieved as they now are using &nbsp;.

-- 
dave salovesh
darsal@tezcat.com

Received on Monday, 14 July 1997 20:40:59 UTC