RE: [css3-text] definition of 'word-break: break-all'

> "Yeah, I" breaks as "Y·e·a·h·,·I"
> "I didn't say" breaks as "I·d·i·d·n·'·t·s·a·y"
> 
> while IE9 breaks
> 
> "Yeah, I" as "Y·e·a·h,·I"
> "I didn't" breaks as "I·d·i·d·n't·s·a·y"
> 
> I have to say I like IE9 better for these rather common cases, provide that these are
> embedded in CJK context.

I like IE9 too.

My idea was to allow breaks at first place, but comma and other such characters should be covered by the line-break property, so it covers IE9 behavior (if we can assume the UA has correct set of line-break rules.) That has a risk of interoperability issue because the rules of line-break are UA dependent though.

> (Speaking of examples, the 'word-break: break-all;' part of Example 4:
> 
>   # 这·是·一·些·汉·字·...
> 
> lacks a comma:

Fixed, thank you.

> As another test case which is pretty far from a normal CJK use case, the Thai example in
> the spec has no difference in IE9 when 'word-break:
> break-all' is on. The further proves that IE is pretty close to that statement in Example 4
> as a Thai character is Class SA, not AL.

SA is defined as:

| Therefore complex context analysis, often involving dictionary lookup of some form,
| is required to determine non-emergency line breaks. If such analysis is not available,
| it is recommended to treat them as AL.

So SA=AL if you haven't installed dictionary. But UAX#14 has a lot of such re-assignment rules and is complex that I'm a little nervous to follow example 4, which requires more complex analysis than example 3. I guess I need a little more time to analyze its impact.

> So the current definition seems to be in favor of the WebKit and Gecko's direction. I don't
> necessarily disagree with that but I hope we have more convergence here...

It depends on what to have in the line-break rules. How much we can depend on the rules and how much we should define in break-all needs more thoughts and discussions.


Regards,
Koji

Received on Thursday, 10 May 2012 14:56:44 UTC