- From: Phillips, Addison <addison@lab126.com>
- Date: Wed, 21 Oct 2015 14:36:48 +0000
- To: 馬場孝夫 <baba@bpsinc.jp>, Koji Ishii <kojiishi@gmail.com>
- CC: Florian Rivoal <florian@rivoal.net>, "www-style@w3.org" <www-style@w3.org>, CJK discussion <public-i18n-cjk@w3.org>
I think it important not to play loosely with the concepts here. For example: > - In most languages such as Latin, punctuations are explicit word separator. > Word boundaries are also soft wrap opportunities. > - In CJK, there are soft wrap opportunities between any two characters > (except some combinations). A word boundary is not necessarily a line break opportunity. UAX14 (Unicode Line Breaking Algorithm) should probably be more prominent in this discussion (it's where the line break property comes from)--and it discusses the boundary conditions such as parentheticals, mixed alphanum, number phrases, etc. It doesn't address the ***** issue directly, other than, if I'm reading it correctly, allowing line breaks anywhere in that string. > - A 'letter' is a 'character unit' whose general category [UAX44] is 'L:Letter' or > 'N:Number'. See above. A 'character unit' is better described as an extended grapheme cluster or a tailored extended grapheme cluster. It's the base character of the cluster that you care about here (this is rule LB9 in UAX14). Addison > -----Original Message----- > From: 馬場孝夫 [mailto:baba@bpsinc.jp] > Sent: Wednesday, October 21, 2015 6:40 AM > To: Koji Ishii > Cc: Florian Rivoal; www-style@w3.org; CJK discussion > Subject: Re: [css-text-3] word-break: break-all > > > A. Does the current spec[1] allows UA to break between e.g., "*" when UA > does not in normal breaking? > > My understanding is 'yes'. > > As you wrote, > - 'word-break: break-all' adds soft wrap opportunities between two > typographic *letters*. > - A 'letter' is a 'character unit' whose general category [UAX44] is 'L:Letter' or > 'N:Number'. > - U+002A ASTERISK is belongs to 'P:Punctuation'. > - Therefore, 'word-break: break-all' doesn't add a soft wrap opportunity > between two asterisks. > > However, regardless of 'word-break' property, there originally are soft wrap > opportunity around punctuations. > > - In most languages such as Latin, punctuations are explicit word separator. > Word boundaries are also soft wrap opportunities. > - In CJK, there are soft wrap opportunities between any two characters > (except some combinations). > - (I don't really understand for case Thai, Lao, and Khmer) > > For example, in the case 'word-break: normal', there are original soft wrap > opportunities marked as '-'. > > ABC,DE**F&F > | > V > ABC-,-DE-*-*-F-&-F > > Of course breaking between 'C' and ',' is prohibited in the most case, but this > is due to 'line-break' property. > (Since 'line-break: anywhere' doesn't exist, most UAs prohibit break before > ',' even if 'line-break' is 'loose'.) > > So I think that two asterisks should break if 'line-break: loose', otherwise > should not break. (UA dependent) > > --- > By the way, this understanding is not match for current browser's behaviors. > In addition, I don't think the behavior of my understanding is very useful for > CJK. > > > C. Loosen it, but call out a few obvious cases of what CJK authors would > expect informally. > > D. Loosen it, with explicit cases where UA should not break. > > So I think C or D with adding some notes(*) related with 'line-break' > are better. > > * Like "UA can refer word-break property to determine breaking rules" > to section 5.3. I haven't consider enough about this yet. > > ---------------------------------------------------- > ビヨンド・パースペクティブ・ソリューションズ株式会社 > 〒160-0023 > 東京都新宿区西新宿6-20-7 コンシェリア西新宿TOWER'S WEST 2F > Tel: 03-6279-4320 Fax: 03-6279-4450 > http://www.bpsinc.jp > 馬場 孝夫(Baba Takao) > > > On Mon, Oct 19, 2015 at 6:48 PM, Koji Ishii <kojiishi@gmail.com> wrote: > > So, to summary: > > > > A. Does the current spec[1] allows UA to break between e.g., "*" when > > UA does not in normal breaking? > > > > If yes, I'm solved. > > > > If not, can we loosen "letters" to "characters"? > > > > B. Just loosen it, and let UA devs to make proper judge for CJK use cases. > > C. Loosen it, but call out a few obvious cases of what CJK authors > > would expect informally. > > D. Loosen it, with explicit cases where UA should not break. > > > > My understanding is "no" to A, and I'm fine with B, C, D, or open to > > other proposals if any. > > > > [1] https://drafts.csswg.org/css-text-3/#valdef-word-break-break-all > > > > /koji > >
Received on Wednesday, 21 October 2015 14:37:48 UTC