Re: Bopomofo Ruby layout hack from Yijun Chen on 2013-12-21 (public-i18n-cjk@w3.org from October to December 2013)

From: Yijun Chen <ethantw@me.com>
Date: Sun, 22 Dec 2013 02:32:54 +0800
To: "川幡太�@ ( Taichi KAWABATA )" <kawabata.taichi@gmail.com>, CJK discussion <public-i18n-cjk@w3.org>
Cc: 董福興 Bobby Tung <bobbytung@wanderer.tw>, Richard Ishida <ishida@w3.org>, robin@w3.org, Kawamori Masahito <kawamori@w3.org>
Message-id: <00E64DD4-FFA0-4D85-8C8E-8FD0AC2585BC@me.com>
Dear All,

> If that would cause some problem, then you should clarify how such
> problem may be caused by the use-case.

I am the author of the hack. While I was implementing this ruby layout, I consulted the old Ruby Extension Spec[1] and didn�脌 know about the in-draft one, that�䏭 the main reason why the hack does not follow the new Spec. But I�羮 like to point out some issues with the new Spec.

To my knowledge, the Spec does not limit the ruby spans to characters, words, phrases or sentences, which means I can apply annotations to a whole paragraph in *one* ruby element, 

<p><ruby>
 <rb>
 <rt>
</ruby></p>

In a large span like paragraphs, it is easier and more effective to write ruby with reference-like syntax, not to mention the readability of the code. For instance [2],

<p><ruby>
 <rbc>
  <rb>�擧�<rb>��<rb>銝剖�<rb>甇瑕蟮<rb>銝�<rb>��敺�<rb>銝���<rb>��<rb>瞍Ｘ�<rb>撱箇�<rb>��<rb>銝剖��𧢲�</rb>嚗�<rb>甇瑞�</rb>12<rb>銝�</rb>��<rb>16雿�<rb>�������嘥��笔��賣䲰�匧予摨頣�1421撟湔��鞟��琿��喲�憭拙���1368撟湛��勗��见銁蝯曹�颲脫�韏瑞儔頠滚�嚗�銁�匧予摨𦦵蒈�綽��贝�憭扳�����嘥�撟湛��见�餈����敺抬�蝬㯄��𤾸云蟡𡝗𠻸�����揪甇虫�瘝鳴��萇移�𡝗祥銝阡�鞉郊�Ｗ儔�见���
 </rbc>

 <rtc lang=�𦲤h-cmn-Latn">
  <rp lang=�𦲤h-cmn��>嚗�<strong>銝𦠜䲮畾菔氜��芦�朞店瞍Ｚ��潮𨺗嚗�</strong></rp><rt>mingchao
  <rt>shi
  <rt>zhongguo
  <rt>lishi
  <rt>shang
  <rt>zuihou
  <rt>yige
  <rt>you
  <rt>hanzu
  <rt>jianli
  <rt>de
  <rt>zhongyuanwangchao<rp>, </rp>
  <rt>lijing
  <rp>12</rp>
  <rt>shi
  �色��<rp>嚗�</rp>
 </rtc>

 <rtc lang=�𦨭an-Latn��>
  <rp lang=�𦲤h-cmn">嚗�<strong>銝𦠜䲮畾菔氜��慐�𡑒�蝢�收�潮𨺗嚗�</strong></rp><rt>bin-tiau
  <rt>si
  �色��
  <rp>嚗�</rp>
 </rtc>
</ruby></p>

Unlike Japanese, we are sometimes required to annotate every single Hanzi for its pronunciation in Chinese textbooks or dictionaries. Reference-like ruby could be a simpler way to do so. However, these kinds of usage would not be allowed if we try to annotate characters/words with a different span, simply because the lack of `rbspan` attribution.

Also, for those browsers where ruby isn�脌 supported, we can display plain annotation text paragraphically (with punctuation), instead of following each character or phrase.

Further more, if we wrap each word within each ruby element, it would be impossible to add cross-phrase elements round them. For example, if I plan to add a hyperlink to the text �䀹��質��� in the sentence �𧜶雿㗒`�Ｘ�~`�質�~`�悽`��~`甇斋��, since they are in different ruby elements, I am forced to separate the link for each ruby base respectively.

<ruby>雿�</ruby>
<ruby>��<a href="#yes-i-do��>��</a></ruby>
<a href=��#yes-i-do��><ruby>�質�</ruby></a>
<ruby>��</ruby>
<ruby>��</ruby>
<ruby>甇�</ruby>嚗�

In a single reference-like ruby with `rbspan` attribution. I can simply write,

<ruby>
 <rbc>
  <rb>雿�
  <rb>��
  <a href="#yes-i-do��>
    <rb>��
    <rb>��<rb>��
  </a>
  <rb>��
  <rb>��
  <rb>甇�</rb>嚗�
 </rbc>

 <rtc>�色��</rtc>
 <rtc>�色��</rtc>
</ruby>

In this case, one hyperlink provides better semantic structure; while two break the simplicity of the syntax. The behaviour of the links will be a bit nonsense as well (such as hover, active and focus events).


* * *
> Ethan may have some thinking about the mark-up, he may reply the thread to describe technical details.

Thanks to Bobby for starting the thread! The issue above aside, I also found some UA behaviour worth discussing. 

While copying text (by users) within complex ruby span such as the example below, how will the browsers return the ruby text?

<ruby>
 瞍�
 <rb>摮�</rb>
 
 <rp> (</rp>
 <rt>�卝�</rt>
 <rt>��</rt>
 <rp>) </rp>
 
 <rp> (</rp>
 <rtc><rt>Kanji</rt></rtc>
 <rp>) </rp>
</ruby>
 
We may have two conditions, both seem fairly appropriate and rational to me,

瞍Ｕ��枏��𡟛anji
Or,
瞍Ｗ��卝��𡟛anji

Here�䏭 another one�癳ith or without spaces in between?

These are by far the problematic situations I can think of or had encountered. Correction and suggestions welcomed.


1.  http://www.w3.org/TR/ruby/  
2. http://zh.wikipedia.org/wiki/�擧� 

Sincerely,
Chen Yijun (@ethantw)




撌嘥飽 憭芯� ( Taichi KAWABATA ) <kawabata.taichi@gmail.com> �� 2013/12/21 15:50 撖恍�嚗�

> 
> Dear Bobby, 
> 
>>> In <1AC0EE7B-7E2D-4B2C-A7B4-174498DA783B@wanderer.tw>, 
>>> Bobby Tung wrote:
> 
>> Hi All,
>> I'd like to introduce some Bopomofo Ruby layout hack here. 
> 
>> http://css.hanzi.co/demo/ruby.html#zhuyin_fuhao-zhipai
> 
>> Ethan made this layout hack and applied on a open dictionary project
>> "Moedict" "Moedict" is an open data project in Taiwan. Get mandarin
>> dictionary data from Ministry of Education Taiwan, and add several
>> features.
> 
>> https://www.moedict.tw/
> 
>> Data itself could be useful for semantic web, and it used double side ruby to show both Bopomofo and pinyin.
>> Ethan may have some thinking about the mark-up, he may reply the thread to describe technical details.
> 
> It is quite interesting.
> 
> In the newest HTML5 Ruby Extension Spec
> (http://darobin.github.io/html-ruby/), there is no longer "rbc" element
> and "rbspan" attribute.
> 
> As of it, to cope with current HTML5 ruby, your usage of
> 
> <ruby>
> <rbc>
>  <rb>雿�</rb>
>  <rb>��</rb><rb>��</rb>
>  <rb>��</rb><rb>��</rb>
>  ....
> </rbc>
> <rtc class="zhuyin">
>  <rt>�䎚��</rt>
>  <rt>�溻��</rt>
>  <rt>�佯�</rt>
>  <rt>�𨳍���</rt>
>  <rt>�剹��栶��</rt>
>  ....
> </rtc>
> <rtc class="romanization">
>  <rt>L穩</rt> 
>  <rt rbspan="2">k獺m-贖</rt>
>  <rt rbspan="2">thiann-tio�h</rt>
>  ....
> </rtc>
> </ruby>
> 
> should be split to multiple ruby such as ::
> 
> <ruby>
>  <rb>雿�</rb>
>  <rtc><rt>�䎚��</rt></rtc>
>  <rtc>L穩</rt></rtc>
> </ruby>
> <ruby>
>  <rb>��</rb><rb>��</rb>
>  <rtc><rt>�溻��</rt><rt>�佯�</rt></rtc>
>  <rtc><rt>k獺m-贖</rt></rtc>
> </ruby>
> <ruby>
>  <rb>��</rb><rb>��</rb>
>  <rtc><rt>�𨳍���</rt><rt>�剹��栶��</rt></rtc>
>  <rtc><rt>thiann-tio�h</rt></rtc>
> </ruby>
> 
> If that would cause some problem, then you should clarify how such
> problem may be caused by the use-case.
> 
> Regards,
> 
> -- 
> ---------------------------------------------------------------------
>  撌嘥飽 憭芯�  (kawabata.taichi@gmail.com)       KAWABATA, Taichi
Received on Saturday, 21 December 2013 18:33:45 UTC