W3C home > Mailing lists > Public > public-i18n-cjk@w3.org > October to December 2013

Re: Bopomofo Ruby layout hack

From: Yijun Chen <ethantw@me.com>
Date: Sun, 22 Dec 2013 02:32:54 +0800
Cc: 董福興 Bobby Tung <bobbytung@wanderer.tw>, Richard Ishida <ishida@w3.org>, robin@w3.org, Kawamori Masahito <kawamori@w3.org>
Message-id: <00E64DD4-FFA0-4D85-8C8E-8FD0AC2585BC@me.com>
To: "川幡 太一 ( Taichi KAWABATA )" <kawabata.taichi@gmail.com>, CJK discussion <public-i18n-cjk@w3.org>
Dear All,

> If that would cause some problem, then you should clarify how such
> problem may be caused by the use-case.

I am the author of the hack. While I was implementing this ruby layout, I consulted the old Ruby Extension Spec[1] and didn’t know about the in-draft one, that’s the main reason why the hack does not follow the new Spec. But I’d like to point out some issues with the new Spec.

To my knowledge, the Spec does not limit the ruby spans to characters, words, phrases or sentences, which means I can apply annotations to a whole paragraph in *one* ruby element, 

<p><ruby>
	<rb>
	<rt>
</ruby></p>

In a large span like paragraphs, it is easier and more effective to write ruby with reference-like syntax, not to mention the readability of the code. For instance [2],

<p><ruby>
	<rbc>
		<rb>明朝<rb>是<rb>中國<rb>歷史<rb>上<rb>最後<rb>一個<rb>由<rb>漢族<rb>建立<rb>的<rb>中原王朝</rb>,<rb>歷經</rb>12<rb>世</rb>、<rb>16位<rb>皇帝。明朝初期定都於應天府,1421年明成祖遷都至順天府。1368年,朱元璋在統一農民起義軍後,在應天府登基,國號大明。明朝初年,國力迅速恢復,經過明太祖朱元璋的洪武之治,勵精圖治並逐步恢復國力。
	</rbc>

	<rtc lang=“zh-cmn-Latn">
		<rp lang=“zh-cmn”>(<strong>上方段落的普通話漢語拼音:</strong></rp><rt>mingchao
		<rt>shi
		<rt>zhongguo
		<rt>lishi
		<rt>shang
		<rt>zuihou
		<rt>yige
		<rt>you
		<rt>hanzu
		<rt>jianli
		<rt>de
		<rt>zhongyuanwangchao<rp>, </rp>
		<rt>lijing
		<rp>12</rp>
		<rt>shi
		……<rp>)</rp>
	</rtc>

	<rtc lang=“nan-Latn”>
		<rp lang=“zh-cmn">(<strong>上方段落的閩南語羅馬拼音:</strong></rp><rt>bin-tiau
		<rt>si
		……
		<rp>)</rp>
	</rtc>
</ruby></p>

Unlike Japanese, we are sometimes required to annotate every single Hanzi for its pronunciation in Chinese textbooks or dictionaries. Reference-like ruby could be a simpler way to do so. However, these kinds of usage would not be allowed if we try to annotate characters/words with a different span, simply because the lack of `rbspan` attribution.

Also, for those browsers where ruby isn’t supported, we can display plain annotation text paragraphically (with punctuation), instead of following each character or phrase.

Further more, if we wrap each word within each ruby element, it would be impossible to add cross-phrase elements round them. For example, if I plan to add a hyperlink to the text ‘有聽著’ in the sentence ‘`你~`敢有~`聽著~`咱~`的~`歌~’, since they are in different ruby elements, I am forced to separate the link for each ruby base respectively.

<ruby>你</ruby>
<ruby>敢<a href="#yes-i-do”>有</a></ruby>
<a href=“#yes-i-do”><ruby>聽著</ruby></a>
<ruby>咱</ruby>
<ruby>的</ruby>
<ruby>歌</ruby>?

In a single reference-like ruby with `rbspan` attribution. I can simply write,

<ruby>
	<rbc>
		<rb>你
		<rb>敢
		<a href="#yes-i-do”>
				<rb>有
				<rb>聽<rb>著
		</a>
		<rb>咱
		<rb>的
		<rb>歌</rb>?
	</rbc>

	<rtc>……</rtc>
	<rtc>……</rtc>
</ruby>

In this case, one hyperlink provides better semantic structure; while two break the simplicity of the syntax. The behaviour of the links will be a bit nonsense as well (such as hover, active and focus events).


* * *
> Ethan may have some thinking about the mark-up, he may reply the thread to describe technical details.

Thanks to Bobby for starting the thread! The issue above aside, I also found some UA behaviour worth discussing. 

While copying text (by users) within complex ruby span such as the example below, how will the browsers return the ruby text?

<ruby>
	漢
	<rb>字</rb>
	
	<rp> (</rp>
	<rt>かん</rt>
	<rt>じ</rt>
	<rp>) </rp>
	
	<rp> (</rp>
	<rtc><rt>Kanji</rt></rtc>
	<rp>) </rp>
</ruby>
 
We may have two conditions, both seem fairly appropriate and rational to me,

漢かん字じKanji
Or,
漢字かんじKanji

Here’s another one—with or without spaces in between?

These are by far the problematic situations I can think of or had encountered. Correction and suggestions welcomed.


1.  http://www.w3.org/TR/ruby/  
2. http://zh.wikipedia.org/wiki/明朝 

Sincerely,
Chen Yijun (@ethantw)




川幡 太一 ( Taichi KAWABATA ) <kawabata.taichi@gmail.com> 於 2013/12/21 15:50 寫道:

> 
> Dear Bobby, 
> 
>>> In <1AC0EE7B-7E2D-4B2C-A7B4-174498DA783B@wanderer.tw>, 
>>> Bobby Tung wrote:
> 
>> Hi All,
>> I'd like to introduce some Bopomofo Ruby layout hack here. 
> 
>> http://css.hanzi.co/demo/ruby.html#zhuyin_fuhao-zhipai
> 
>> Ethan made this layout hack and applied on a open dictionary project
>> "Moedict" "Moedict" is an open data project in Taiwan. Get mandarin
>> dictionary data from Ministry of Education Taiwan, and add several
>> features.
> 
>> https://www.moedict.tw/
> 
>> Data itself could be useful for semantic web, and it used double side ruby to show both Bopomofo and pinyin.
>> Ethan may have some thinking about the mark-up, he may reply the thread to describe technical details.
> 
> It is quite interesting.
> 
> In the newest HTML5 Ruby Extension Spec
> (http://darobin.github.io/html-ruby/), there is no longer "rbc" element
> and "rbspan" attribute.
> 
> As of it, to cope with current HTML5 ruby, your usage of
> 
> <ruby>
> <rbc>
>  <rb>你</rb>
>  <rb>敢</rb><rb>有</rb>
>  <rb>聽</rb><rb>著</rb>
>  ....
> </rbc>
> <rtc class="zhuyin">
>  <rt>ㄌㄧˋ</rt>
>  <rt>ㄍㆰˋ</rt>
>  <rt>ㄨ˫</rt>
>  <rt>ㄊㄧㆩ</rt>
>  <rt>ㄉㄧㄜㆷ̍</rt>
>  ....
> </rtc>
> <rtc class="romanization">
>  <rt>Lí</rt> 
>  <rt rbspan="2">kám-ū</rt>
>  <rt rbspan="2">thiann-tio̍h</rt>
>  ....
> </rtc>
> </ruby>
> 
> should be split to multiple ruby such as ::
> 
> <ruby>
>  <rb>你</rb>
>  <rtc><rt>ㄌㄧˋ</rt></rtc>
>  <rtc>Lí</rt></rtc>
> </ruby>
> <ruby>
>  <rb>敢</rb><rb>有</rb>
>  <rtc><rt>ㄍㆰˋ</rt><rt>ㄨ˫</rt></rtc>
>  <rtc><rt>kám-ū</rt></rtc>
> </ruby>
> <ruby>
>  <rb>聽</rb><rb>著</rb>
>  <rtc><rt>ㄊㄧㆩ</rt><rt>ㄉㄧㄜㆷ̍</rt></rtc>
>  <rtc><rt>thiann-tio̍h</rt></rtc>
> </ruby>
> 
> If that would cause some problem, then you should clarify how such
> problem may be caused by the use-case.
> 
> Regards,
> 
> -- 
> ---------------------------------------------------------------------
>  川幡 太一  (kawabata.taichi@gmail.com)       KAWABATA, Taichi
Received on Saturday, 21 December 2013 18:33:45 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:10:24 UTC