Re: [css-ruby] Rule of line breaking between ruby bases

Thank you for asking and giving me a chance to discuss, as always.

TL;DR; could you handle one ruby container as one ideographic character?

While you're right that non-CJK characters as bases is rare, closing
characters such as comma or close parenthesis next to ruby is quite
common, and I do not want to break there. Does this sound reasonable
to implement?

Actually, when I was working on an e-reader platform a few years ago,
this was considered as a blocker and we needed to workaround. Thought
it was fixed then, it's unfortunate to know it's not fixed yet.

I then thought to handle it as U+FFFC Object Replacement Character,
just like what we do for text-combine-horizontal, but then I remember
a recent change to CSS Text Level 3, 5.1 Line Breaking Details[1]
defines to put a soft wrap opportunity before and after U+FFFC. This
is a different rule from UAX#9, but had to do for web-compatibility.

That said, currently text-combine-horizontal is also broken, and no
way to save line breaking for image-based characters. I'll send a
separate thread on this, and probably ruby should be handled the same
way as whatever we conclude for text-combine-horizontal.

If you can wait for its conclusion, that's great. If you're in hurry,
I suppose we should conclude to one of any ideographic characters
(such as U+4E00 or U+6C34 (one we chose for the height of
text-combine) or something like that, they're equivalent from line
breaking purposes.)

[1] http://dev.w3.org/csswg/css-text-3/#line-break-details

/koji

On Wed, Jan 14, 2015 at 7:25 AM, Xidorn Quan <quanxunzhen@gmail.com> wrote:
> The current spec says:
>
>> Whether ruby can break between two adjacent ruby bases is controlled by
>> normal line-breaking rules for the base text, exactly as if the ruby bases
>> were adjacent inline boxes.
>
> I propose that we should add that, there is always a soft wrap opportunity
> between ruby bases.
>
> There are two reasons:
> 1. For real world use cases, we nearly always use ruby with CJK characters
> as bases. Even if we use latins in bases, we generally won't spread one word
> into several bases. Hence announcing this doesn't affect real world use
> cases.
> 2. All impls including WebKit, Trident, and current Gecko behave in this
> way.
>
> I think there is no reason for the spec to cover an imaginary use case
> against UA impls.
>
> - Xidorn

Received on Wednesday, 14 January 2015 12:47:45 UTC