- From: Kang-Hao (Kenny) Lu <kanghaol@oupeng.com>
- Date: Fri, 30 Nov 2012 07:30:03 +0800
- To: CJK discussion <public-i18n-cjk@w3.org>
Forwarded. You might want to follow up with this issue on https://www.w3.org/Bugs/Public/show_bug.cgi?id=20114 https://www.w3.org/Bugs/Public/show_bug.cgi?id=20115 instead of this list. Cheers, Kenny -------- Original Message -------- Subject: [whatwg] <ruby> markup problems Date: Tue, 27 Nov 2012 18:42:16 -0800 From: Tab Atkins Jr. <jackalmage@gmail.com> To: WHATWG List <whatwg@whatwg.org> The current HTML <ruby> markup has a few issues where it does not properly solve the relevant use-cases. In this email I'll outline these problems, and suggest some simple fixes that maintain the overall simplicity of the ruby model. 1. Inlining ruby ============= The current ruby model explicitly uses a "column-based" model of ruby, where runs of base text and ruby text must alternate in the markup, so that ruby text is associated with the immediately preceding ruby base. This does *not* work well for common ruby inlining cases. For example, the word Tokyo is written as 東京 in kanji and とうきょう in kana. The base-text pairs are 東-とう 京-きょう, and the ruby markup must create those associations accordingly. However, when rendered inline, the correct rendering is 東京(とうきょう) with the word kept together as one unit, not 東(とう)京(きょう). The current ruby model in HTML, though, requires that you either mark up the ruby correctly and get the latter display, or incorrectly group the entire thing as one ruby text over one ruby base to get the former display. This is important, because inlining is not just a fallback measure for down-level clients. Inlining is often done as a legitimate stylistic choice, such as when there's only a small amount of ruby in the text (to avoid the increased line-height on the few lines that contain ruby) or when the base text is already small (to avoid making the ruby text unreadably small). This can be solved easily by also allowing a "row-based" model, where runs of <rb> elements can be followed by runs of <rt> elements, and they're matched up index-wise. If you can then switch back to <rb>, you still retain the convenience of "column-based" when that's sufficient. 2. Double-sided ruby ================= When you want to create double-sided ruby, with the ruby text on both sides of base text, the current HTML model posits two separate and fairly different markup models. In the first, when the group boundaries for both ruby text runs are the same, it allows you to have two <rt>s following an <rb>, with the obvious meaning. In the second, when the group boundaries do *not* line up (in particular, for the common case where one line of ruby is per-character and the other is for the whole group, such as with a pinyin and English translation), it requires you to nest two <ruby> elements, with the inner one supplying the per-character annotations and the outer supplying the whole-group ones. Having to learn and use two different markup patterns for two nearly identical use-cases is sub-optimal for authors. It would be best if they could just learn one model that works for both. On the implementation side, this also requires two different layout models for essentially the exact same thing. This is unnecessarily complicated; again, one simple way to get both would be preferred. This is easy to address. Add an <rtc> element (name taken from the XHTML Ruby module), which is used for the second line of text. You can fill an <rtc> with <rt> elements, in which case they match up index-wise with the preceding run of <rb> elements. The last <rt> (or, if no <rt>s were given at all, the naked text that was implicitly wrapped in an <rt>) automatically spans the remaining bases in the preceding run. This makes both cases trivial. If both runs of ruby are per-character, you can just write: <ruby><rb>FOO<rb>BAR<rt>baz1<rt>baz2<rtc><rt>qux1<rt>qux2</ruby> Or, in the pure column-based model: <ruby>FOO<rt>baz1<rtc>qux1<rb>BAR<rt>baz2<rtc>qux2</ruby> Alternately, if the second line of ruby text spans the entire group, that's also trivial, and very simlar: <ruby><rb>FOO<rb>BAR<rt>baz1<rt>baz2<rtc>qux1 qux2</ruby> As you can see, the only difference is that the <rtc> contains a single (implicit) <rt>, rather than two <rt>s. It seems plainly obviously that this is simpler for authors; it's also simpler for implementors, because we don't have to infer that we should be formatting something as double-ruby from the presence of nested <ruby> elements. Based on fantasai's research at <http://fantasai.inkedblade.net/weblog/2011/ruby/#double> and her subsequent conversations with i18n folks and CJK publishing experts, these are the only two real failures of the current markup model. These two simple changes would make the HTML side of ruby work great; anything left (like jukugo ruby) can be handled fine by the CSS side (when we rewrite the Ruby spec to not be sucky). ~TJ
Received on Friday, 30 November 2012 00:15:59 UTC