W3C home > Mailing lists > Public > www-style@w3.org > April 2012

Re: [css3-text] scoping line break controls, characters that disappear at the end of lines

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Sun, 01 Apr 2012 19:21:06 -0700
Message-ID: <4F790D12.3060808@ix.netcom.com>
To: Mark Davis ☕ <mark@macchiato.com>
CC: Koji Ishii <kojiishi@gluesoft.co.jp>, "www-style@w3.org" <www-style@w3.org>, WWW International <www-international@w3.org>

Leaving aside SPACE and NBSP (which have well established usage) the 
fixed width spaces in Unicode are needed whenever one intends a specific 
increment of space that floats with the text (yes they can be abused for 
paragraph indentation, but so can many things). For example, 
mathematical texts have specific requirements for such usage.

In normal layout, that space can "disappear" (=become not rendered) at a 
line break, because the purpose, separation, is achieved by the break, 
and the precise amount is then less important.

However, if Ambrose is correct, there is some use of full-width spaces 
that amounts to quotation mark-like "bracketing". (He writes that this 
is used around names as a honorific). In that case, what would you expect?.

  * Would you expect background color to be shown for such a space when
    it's at the end of a line? (last character)
  * Would you expect it to be wrapped around (esp. in grid-layout styles)?

I suspect I would expect the former, but for the latter, the only way to 
get a reasonable answer is to look at documents and/or prevalence in 
implementations targeted explicitly at users of that convention.


PS: selection behavior is a bit different. For editing, you want to 
allow people to see the characters, independent of whether they are 
temporarily not rendered.

On 4/1/2012 6:18 PM, Mark Davis ☕ wrote:
> I tend to disagree, if I understand correctly what you mean by "B. If 
> it occurs at the end of a line, does it take up space?"
> Consider the following case.
>  1. The text is "abcd<space1><space2><space3>efg"
>  2. "abcd" fit within the margins, but
>  3. "abcd<space1><space2><space3>e" does not.
> I would expect to see the following displayed:
> abcd
> efg
> and not
> abcd
>   efg
> That's the case:
>  1. even if some of the spaces where fixed width (basically, no matter
>     which of the Unicode whitespaces they were, except for the Glue
>     characters NBSP and ZWNBSP).
>  2. even if the margins were just barely wider than "abcd", so that
>     the width up through just before the "e" was too wide to fit
>     between the margins.
> I would characterize this as "not taking up space at the end of the 
> line"; that is, no matter how many spaces at the end of line, and no 
> matter which they are (excepting glue), you wouldn't expect to see 
> them wrap to the start of the next line. So if that is what Word/IE9 
> are doing, it looks to me that they are doing the right thing.
> (That doesn't stand in the way of being able to select/copy whitespace 
> characters that are after "abcd", of course.)
> ------------------------------------------------------------------------
> Mark <https://plus.google.com/114199149796022210033>
> /
> /
> /— Il meglio è l’inimico del bene —/
> //
> On Sat, Mar 31, 2012 at 13:10, Koji Ishii <kojiishi@gluesoft.co.jp 
> <mailto:kojiishi@gluesoft.co.jp>> wrote:
>     I asked this question for ideographic spaces at
>     public-html-ig-jp@w3.org <mailto:public-html-ig-jp@w3.org> in
>     January without good conclusion at that point. I then had some
>     discussion with fantasai, investigated a little more, and came
>     into diffident conclusion than before.
>     In short, I support the current spec--keep around all those
>     fixed-width spaces.
>     Long version: fantasai helped me to make the question simpler:
>     A. If it occurs at the beginning of a line, does it take up space?
>     B. If it occurs at the end of a line, does it take up space?
>     C. If there is more than one together, are they kept together, or
>     can we break between them?
>     By eliminating logically incorrect combinations and incorporating
>     opinions from Japan, we have 3 options:
>     1. YES on the beginning, NO on the end, and keep consecutive
>     spaces together.
>     2. YES on the beginning, YES on the end of line, and allow break
>     between them.
>     3. Variation of 1; allow only one ideographic space at the end,
>     and ignore the rest.
>     MS Word behaves #1. Most traditional Japanese word processors in
>     1980/90s behaved #2. #3 is from JLTF, where he likes Word's
>     behavior except that an ideographic space after an exclamation or
>     question mark should be honored.
>     I quickly looked at current behaviors[1]:
>     MS Word: #1
>     Adobe InDesign: #2
>     IE9: #1
>     FF11: Neither. Breaks look like IE, but the last two are
>     different. Justification behavior is also different.
>     Chrome18/Safari5: #2
>     MS Word took #1 because in 1990s, many Japanese authors used
>     ideographic spaces and ASCII spaces mixed without understanding
>     so. Oftentimes they do so intentionally assuming two ASCII spaces
>     are equivalent to one ideographic space, because it was so in most
>     traditional CUI-based software. To handle two ASCII spaces and one
>     ideographic space in the same way, and also to support Latin
>     typography, #1 was the best choice.
>     Today, in HTML world, I don't think Japanese authors have such
>     requirements, so there's no big motivation to take the #1 for CSS.
>     The point JLTF made--an ideographic space after
>     exclamation/question marks--makes sense, but it's too special case
>     once we took #1, so Word gave up implementing it. But it's free of
>     cost if we go with #2.
>     Give this, given InDesign taking option 2, and given all browsers
>     behaving differently today, I think option 2 makes the most sense.
>     Note that this is filed as CSS-ISSUE-220[2].
>     [1] http://lists.w3.org/Archives/Public/www-archive/2012Mar/0058.html
>     [2] http://www.w3.org/Style/CSS/Tracker/issues/220
>     Regards,
>     Koji
>     -----Original Message-----
>     From: fantasai [mailto:fantasai.lists@inkedblade.net
>     <mailto:fantasai.lists@inkedblade.net>]
>     Sent: Tuesday, January 10, 2012 10:37 AM
>     To: www-style@w3.org <mailto:www-style@w3.org>; 'WWW International'
>     Subject: [css3-text] scoping line break controls, characters that
>     disappear at the end of lines
>     In 2008 roc outlined some principles for how line breaking
>     controls (i.e. 'white-space', at the time) are scoped to
>     line-breaking opportunities:
>     In
>     <http://lists.w3.org/Archives/Public/www-style/2008Dec/0043.html>
>     Robert O'Callahan wrote:
>     >
>     > 1) Break opportunities induced by white space are entirely
>     governed by the
>     >    value of the 'white-space' property on the enclosing element.
>     So, spaces
>     >    that are white-space:nowrap never create break opportunities.
>     > 2) When a break opportunity exists between two non-white-space
>     >    characters, e.g. between two Kanji characters, we consult the
>     value of
>     >    'white-space' for the nearest common ancestor element of the
>     two characters
>     >    to decide if the break is allowed.
>     I'm trying to encode this into the spec. My question is, are
>     spaces (U+0020) the only characters that fall into category #1?
>     What about the other characters in General Category Zs?
>     http://www.fileformat.info/info/unicode/category/Zs/list.htm
>     In particular, U+1680 is, like U+0020, expected to disappear at
>     the end of a line.
>     Which brings up another issue: which characters should disappear
>     at the end of a line? Right now we keep around all those
>     fixed-width spaces.
>     ~fantasai
Received on Monday, 2 April 2012 02:21:40 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:14 UTC