W3C home > Mailing lists > Public > www-style@w3.org > October 2015

RE: [css-text-3] word-break: break-all

From: Phillips, Addison <addison@lab126.com>
Date: Wed, 21 Oct 2015 14:36:48 +0000
To: 馬場孝夫 <baba@bpsinc.jp>, Koji Ishii <kojiishi@gmail.com>
CC: Florian Rivoal <florian@rivoal.net>, "www-style@w3.org" <www-style@w3.org>, CJK discussion <public-i18n-cjk@w3.org>
Message-ID: <8e8140d9726749689e6e1be23ea2f5d5@EX13D08UWC002.ant.amazon.com>
I think it important not to play loosely with the concepts here. For example:

> - In most languages such as Latin, punctuations are explicit word separator.
>   Word boundaries are also soft wrap opportunities.
> - In CJK, there are soft wrap opportunities between any two characters
> (except some combinations).

A word boundary is not necessarily a line break opportunity. UAX14 (Unicode Line Breaking Algorithm) should probably be more prominent in this discussion (it's where the line break property comes from)--and it discusses the boundary conditions such as parentheticals, mixed alphanum, number phrases, etc. It doesn't address the ***** issue directly, other than, if I'm reading it correctly, allowing line breaks anywhere in that string. 

> - A 'letter' is a 'character unit' whose general category [UAX44] is 'L:Letter' or
> 'N:Number'.

See above. A 'character unit' is better described as an extended grapheme cluster or a tailored extended grapheme cluster. It's the base character of the cluster that you care about here (this is rule LB9 in UAX14).

Addison

> -----Original Message-----
> From: 馬場孝夫 [mailto:baba@bpsinc.jp]
> Sent: Wednesday, October 21, 2015 6:40 AM
> To: Koji Ishii
> Cc: Florian Rivoal; www-style@w3.org; CJK discussion
> Subject: Re: [css-text-3] word-break: break-all
> 
> > A. Does the current spec[1] allows UA to break between e.g., "*" when UA
> does not in normal breaking?
> 
> My understanding is 'yes'.
> 
> As you wrote,
> - 'word-break: break-all' adds soft wrap opportunities between two
> typographic *letters*.
> - A 'letter' is a 'character unit' whose general category [UAX44] is 'L:Letter' or
> 'N:Number'.
> - U+002A ASTERISK is belongs to 'P:Punctuation'.
> - Therefore, 'word-break: break-all' doesn't add a soft wrap opportunity
> between two asterisks.
> 
> However, regardless of 'word-break' property, there originally are soft wrap
> opportunity around punctuations.
> 
> - In most languages such as Latin, punctuations are explicit word separator.
>   Word boundaries are also soft wrap opportunities.
> - In CJK, there are soft wrap opportunities between any two characters
> (except some combinations).
> - (I don't really understand for case Thai, Lao, and Khmer)
> 
> For example, in the case 'word-break: normal', there are original soft wrap
> opportunities marked as '-'.
> 
>     ABC,DE**F&F
>          |
>          V
>     ABC-,-DE-*-*-F-&-F
> 
> Of course breaking between 'C' and ',' is prohibited in the most case, but this
> is due to 'line-break' property.
> (Since 'line-break: anywhere' doesn't exist, most UAs prohibit break before
> ',' even if 'line-break' is 'loose'.)
> 
> So I think that two asterisks should break if 'line-break: loose', otherwise
> should not break. (UA dependent)
> 
> ---
> By the way, this understanding is not match for current browser's behaviors.
> In addition, I don't think the behavior of my understanding is very useful for
> CJK.
> 
> > C. Loosen it, but call out a few obvious cases of what CJK authors would
> expect informally.
> > D. Loosen it, with explicit cases where UA should not break.
> 
> So I think C or D with adding some notes(*) related with 'line-break'
> are better.
> 
> * Like "UA can refer word-break property to determine breaking rules"
> to section 5.3. I haven't consider enough about this yet.
> 
> ----------------------------------------------------
> ビヨンド・パースペクティブ・ソリューションズ株式会社
> 〒160-0023
> 東京都新宿区西新宿6-20-7 コンシェリア西新宿TOWER'S WEST 2F
> Tel: 03-6279-4320 Fax: 03-6279-4450
> http://www.bpsinc.jp

> 馬場 孝夫(Baba Takao)
> 
> 
> On Mon, Oct 19, 2015 at 6:48 PM, Koji Ishii <kojiishi@gmail.com> wrote:
> > So, to summary:
> >
> > A. Does the current spec[1] allows UA to break between e.g., "*" when
> > UA does not in normal breaking?
> >
> > If yes, I'm solved.
> >
> > If not, can we loosen "letters" to "characters"?
> >
> > B. Just loosen it, and let UA devs to make proper judge for CJK use cases.
> > C. Loosen it, but call out a few obvious cases of what CJK authors
> > would expect informally.
> > D. Loosen it, with explicit cases where UA should not break.
> >
> > My understanding is "no" to A, and I'm fine with B, C, D, or open to
> > other proposals if any.
> >
> > [1] https://drafts.csswg.org/css-text-3/#valdef-word-break-break-all

> >
> > /koji
> >

Received on Wednesday, 21 October 2015 14:37:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:57 UTC