Re: Multiple spaces converted into one

Greg Noel wrote:

> In general, subsequent spaces should convert the _prior_ space
> into a non-breaking space.  If you convert the subsequent spaces,
> it's possible to have the line broken at the first space and then
> be faced with non-breakable spaces at the beginning of the
> displayed line.

Yes.  With my original suggestion of a space followed by one or more
non-breaking spaces, Mozilla and MSIE can display a leading space on a
new line.

Ideally, there might be an (X)HTML character entity of a
"not-to-be-ignored" space, rather than a "non-breaking" space.  Then the
(X)HTML viewer could construct one or more "user-level" spaces, from
both the single "space" which results from one or more consecutive
whitespace characters in the file, and from each "not-to-be-ignored"
space character entity.  Sensibly written viewers would start the new
line at the first non-space character.  Instead, (X)HTML has only a
"non-breaking space", which has the "not-to-be-ignored" property, but
further specifies that a new line cannot be started there.

For a single space typed or pasted at the start of a line, Mozilla
Composer inserts a single non-breaking space.  For two or more spaces,
it follows the same algorithm as for spaces between words and at the end
of lines, as described below.   The result is that during composition
and display (on Mozilla and MSIE), the desired spaces will appear at the
start of the line, except when the first non-space character would be
beyond the right-hand limit of the display area.  At that point, the
renderer wraps back to the following line, where it puts the first
non-space character.  All the spaces appear on the initial line - the
spaces themselves are not wrapped to the display width.

Between words or at the end of a line, Mozilla Composer's algorithm is:

n typed or pasted spaces are converted to n-1 non-breaking spaces
followed by 1 space.

This renders fine, without any spaces at the start of lines etc. on
Mozilla and MSIE.


> And it's even more complex than that: spaces at the beginning of a
> segment should all be non-breakable (it's a way of forcing
> indentation) . . .

I agree, but as noted above Mozilla Composer makes the final space an
ordinary breaking space.  I can't imagine when this would be a problem,
but it is impossible to imagine every way people want to use (X)HTML.  I
guess there are probably some good reasons why Composer does this.  For
instance, long lines of spaces might be hard to see when composing, and
could easily go off to the right of the window.  Without the final space
being a breaking space, the first word would not appear in the visible
display.

> and spaces at the end of a segment should all be breakable (so that
> they collapse to nothing when displayed).  Getting  all the cases
> right is tricky.

If there were two or more ordinary breakable spaces (0x20) preceding a
<br>, they wouldn't survive being passed through most HTML editors -
only one would remain.  I don't think it matters much whether these
spaces before a <br> or a </p> are all non-breakable, or non-breakable
followed by one ordinary breakable space - but I guess the latter
option, as implemented in Mozilla Composer, is a good approach because
it ensures the first word of the line is visible on screen when
composing or viewing.  With Mozilla Composer, 6 spaces at the end of a
line results in:

test&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>

Unless the author wants the first word to be off-screen or off-page when
printing of viewing, then I think this approach properly expresses the
author's intention.

Both Mozilla and MSIE display this as 6 spaces on the screen, which are
invisible except for when they scroll off the right side of the window
(resulting in a horizontal scroll-bar appearing) or when a select
operation is performed - the highlighting shows where the invisible
spaces occur.  The text copied to the clipboard has the same number of
spaces as were typed or pasted by the author.

> I agree that it should be the default behavior, but I'm not so sure
> it shouldn't be an option.  For whatever reason, there are some
> people who don't like it.  I've never understood that myself; having
> the em-space between sentences makes it easier to read.

I suggested there be no option because I couldn't imagine why anyone
would want spaces they type or copy in not represented properly in the
file they create.  However some people do.

Christopher Evans wanted the current behavior to remain an option.

Christian Raebild sees no need for the change I am proposing:

> I would say that Amaya should act according to the (X)HTML standard,
> that is, outside of [pre] and [code] sections, multiple spaces should
> be considered one space if typed or pasted into the formatted view in
> Amaya.
>
> . . .
>
> In my opinion, Amaya should not convert any spaces in typed or pasted
> text to &nbsp; entities, it should preserve them as space characters
> (ASCII 32), and then treat them as space characters (ASCII 32) are
> treated according to the relevant (X)HTML DTD.

He also says he could live with manual deletion of the spaces which
would be a problem for him, with the change I propose.  I guess an
option for the current behaviour would be a better alternative to manual
deletion.

I think the author to reader communication path should not be subject to
distortions, such as collapsing spaces.  Since (X)HTML is capable of
achieving this, and since Mozilla Composer has a relatively simple
algorithm which seems to work fine, I suggest that Amaya do this by
default, with a user option to collapse spaces as it currently does.

  - Robin

Received on Monday, 4 September 2006 04:54:05 UTC