[css-text] Feedback to Segment Break Transformation Rules from Koki Takahashi on 2015-12-10 (www-style@w3.org from December 2015)

From: Koki Takahashi <hakatasiloving@gmail.com>
Date: Thu, 10 Dec 2015 19:03:44 +0900
To: www-style@w3.org
Message-ID: <CAATvZcfYrvLr+Ljdq+o-zSuhyFoCsfxGsYGjF9Q-FyELU4g0cw@mail.gmail.com>

Hi everyone, I'm very new to this mailing list. Sorry beforehand if
I'm breaking a manner.

This is feedback to "4.1.2 Segment Break Transformation Rules" of the
latest editor's draft version of CSS Text Module Level 3 [1].

Generally it's nice to introduce this new transformation rule. It
seems to work well for the most of east asian context, especially for
Japanese.

Recently I'm personally developing a module [2] to transform HTML and
remove breaklines according to this transformation rules and found
some issues of the rules.


1. Segment breaks neighboring another element which starts/ends with
whitespace wouldn't be removed.

According to the rule, the following HTML:

    <p>
        日本語
        <span>中文</span>
    </p>

will be rendered as "日本語中文", but the following:

    <p>
        日本語
        <span>
            中文
        </span>
    </p>

will result in "日本語 中文", because "the character before and after the
line feed" (the spec states) of the breakline after "日本語" is "語" and
LF. Then the breakline is transformed to a space.

This will be very common situation when we introduce markups such as
<strong>, <a>, <ruby> inside an inline formatting context. Such
breakline shouldn't be transformed to a space and immediately removed
in Japanese or Chinese context. (In short, the latter should be
rendered as "日本語中文")


2. In some cases, segment breaks between Japanese character and Latin
character should be removed.

As stated in JLREQ 3.2.6 [3], Japanese text layout puts a space
between Japanese character and Latin character. It is corresponding to
the case when the East Asian Width property of the character before
and after the line feed is "W" and "Na". The line feed wouldn't be
removed and would be converted to a space.

It is basically ok, but JLREQ also states some exception. For example,
a space between ideographic comma and latin character should be
removed. The following HTML:

    <p>
        日本語、
        English
    <p>

should be rendered as "日本語、English", not "日本語、 English". It'll be
better to make the rules consider these exceptions.


Thanks.

[1]: https://drafts.csswg.org/css-text-3/#line-break-transform
[2]: https://www.npmjs.com/package/asianbreak
[3]: http://www.w3.org/TR/jlreq/#handling_of_western_text_in_japanese_text_using_proportional_western_fonts


Koki Takahashi
https://hakatashi.com
hakatasiloving@gmail.com

Received on Thursday, 10 December 2015 13:31:41 UTC