W3C home > Mailing lists > Public > public-i18n-cjk@w3.org > January to March 2012

Re: HTML5 and ruby

From: Eric Muller <emuller@adobe.com>
Date: Wed, 18 Jan 2012 10:54:00 -0800
Message-ID: <4F171548.1040108@adobe.com>
To: <public-i18n-cjk@w3.org>
Sorry for the late answer.


On 1/12/2012 5:03 PM, Leif Halvard Silli wrote:
>> Both
>>
>> <ruby><rb>東</rb><rt>とう</rt><rb>  京<  /rb><rt>きょう</rt></ruby>  (may
>> be with a different interleaving of rbs and rts)
>>
>> and
>>
>> <ruby>東<rt>とう</rt>京<rt>きょう</rt><  /ruby>
>>
>> capture the list of pairs {東, とう}, {京, きょう} equally well.
>
>   Why is *any* of the two examples above any better than
> this:
>
> <ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb>  京<  /rb><rt>きょう
> </rt></ruby>

In jukugo ruby, the ruby text of one pair is allowed to be displayed 
overhanging an adjacent base. (The constraint that must be respected - 
and makes it different than a big group ruby - is that for each pair, 
some of a ruby text must be above its base - typically, 1 ruby character 
worth).

Jukugo is used when the base texts form a compound, in which case the 
partial confusion (about which ruby is for which base) is deemed 
acceptable. In exchange, it allows for text that stays more on the grid. 
For example, if you have a 3 kanji compound, 1 kana ruby on the first, 3 
kana ruby on the second, and 2 kana ruby on the third, the leftmost of 
the 3 kana overhangs the first kanji, and you end with a nice three em 
fragment, on the grid; the ruby does not cause the line to be set 
differently than without the ruby.

When you have two adjacent base texts each with ruby but those two base 
texts do not form a compound, then the ruby of one cannot overhang the 
base text of the other. Using the same example of 1, 3, 2 kana ruby, but 
considering it as three separate ruby in succession. the 3 kana of the 
middle one cannot overhang on either side, so that middle part will have 
to be 1.5 em, and some of the line is therefore no longer on the grid. 
The layout is not as nice, but now it's completely unambiguous as to 
which ruby goes with each base.

Given the same characters, and in fact the same pairs, the decision to 
treat those pairs as jukugo or not is based on the semantics of the 
text. It seemed obvious to me that using a single <ruby> vs multiple 
<ruby> was the only way to go, but you are right that I did not made 
that clear.


Note that this is not entirely different from the underline problem 
which was discussed on www-style not too long ago: <u>A</u><u>BC</u> is 
considered distinct from <u>ABC</u> (and a fortiori from three 
successive <u>), especially in CJK world. The underline is used on 
names, and reflecting the parts of the name, as in  (A)(BC) is deemed 
important.

Eric.
Received on Wednesday, 18 January 2012 18:54:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 18 January 2012 18:54:35 GMT