Re: Bopomofo Ruby layout hack from Robin Berjon on 2013-12-23 (public-i18n-cjk@w3.org from October to December 2013)

From: Robin Berjon <robin@w3.org>
Date: Mon, 23 Dec 2013 12:35:33 +0100
To: Yijun Chen <ethantw@me.com>, "川幡太一 ( Taichi KAWABATA )" <kawabata.taichi@gmail.com>, CJK discussion <public-i18n-cjk@w3.org>
CC: 董福興 Bobby Tung <bobbytung@wanderer.tw>, Richard Ishida <ishida@w3.org>, Kawamori Masahito <kawamori@w3.org>
Message-ID: <52B82005.1050509@w3.org>
On 21/12/2013 19:32 , Yijun Chen wrote:
> I am the author of the hack.

Cool, thanks for providing feedback!

> While I was implementing this ruby layout,
> I consulted the old Ruby Extension Spec[1] and didn’t know about the
> in-draft one, that’s the main reason why the hack does not follow the
> new Spec. But I’d like to point out some issues with the new Spec.

Please note that the ruby extension spec has now been included into the 
HTML5 draft. So it now lives at:

http://www.w3.org/html/wg/drafts/html/master/text-level-semantics.html#the-ruby-element

> To my knowledge, the Spec does not limit the ruby spans to characters,
> words, phrases or sentences, which means I can apply annotations to a
> whole paragraph in *one* ruby element,
>
>     <p><ruby>
>     <rb>
>     <rt>
>     </ruby></p>

That is correct (though of course is you want to use more <ruby> 
elements you can).

> Unlike Japanese, we are sometimes required to annotate every single
> Hanzi for its pronunciation in Chinese textbooks or dictionaries.
> Reference-like ruby could be a simpler way to do so. However, these
> kinds of usage would not be allowed if we try to annotate
> characters/words with a different span, simply because the lack of
> `rbspan` attribution.

So, is the lack of rbspan the only thing with HTML5 ruby that's 
preventing you from addressing your use case? We also don't have <rbc> 
but that does not seem essential to your usage. If it's just rbspan we 
can certainly look at adding it. If you don't mind, I would encourage 
you to file a bug:

https://www.w3.org/Bugs/Public/enter_bug.cgi?product=HTML%20WG&component=HTML5%20spec

> Also, for those browsers where ruby isn’t supported, we can display
> plain annotation text paragraphically (with punctuation), instead of
> following each character or phrase.

Unless I'm missing something that's also supported with HTML5 ruby, 
except indeed if you need rbspan in order to usefully place your 
annotations at the end. Note that that's intended not just for browsers 
that don't support ruby (which would make it just a transition 
technology) but also for cases in which one may wish to disable ruby 
(e.g. limited line height).

> Further more, if we wrap each word within each ruby element, it would be
> impossible to add cross-phrase elements round them. For example, if I
> plan to add a hyperlink to the text ‘有聽著’ in the sentence ‘`你~`敢
> 有~`聽著~`咱~`的~`歌~’, since they are in different ruby elements, I am
> forced to separate the link for each ruby base respectively.
>
>     <ruby>你</ruby>
>     <ruby>敢<a href="#yes-i-do”>有</a></ruby>
>     <a href=“#yes-i-do”><ruby>聽著</ruby></a>
>     <ruby>咱</ruby>
>     <ruby>的</ruby>
>     <ruby>歌</ruby>？
>
>
> In a single reference-like ruby with `rbspan` attribution. I can simply
> write,
>
>     <ruby>
>     <rbc>
>     <rb>你
>     <rb>敢
>     <a href="#yes-i-do”>
>     <rb>有
>     <rb>聽<rb>著
>     </a>
>     <rb>咱
>     <rb>的
>     <rb>歌</rb>？
>     </rbc>
>
>     <rtc>……</rtc>
>     <rtc>……</rtc>
>     </ruby>
>
>
> In this case, one hyperlink provides better semantic structure; while
> two break the simplicity of the syntax. The behaviour of the links will
> be a bit nonsense as well (such as hover, active and focus events).

Ah, that is definitely a case that is not supported by the current HTML5 
ruby model because bases are only taken into account when they are 
direct children of <ruby>.

Your use case with links is certainly valid; though I have to say I am 
hesitant to change that as it does add a fair amount of complexity to 
the ruby processing algorithm. (My understanding is that Internet 
Explorer supports, or used to support, something like this.)

How common would you reckon that this is? Do you believe it is likely to 
apply also for elements other than <a>? I'm guessing you could want to 
semantically use e.g. <strong> in the same way, but hopefully not <p>.

Would you mind filing a bug about this? I would like to take the time to 
properly think it through.

> While copying text (by users) within complex ruby span such as the
> example below, how will the browsers return the ruby text?
>
>     <ruby>
>     漢
>     <rb>字</rb>
>
>     <rp> (</rp>
>     <rt>かん</rt>
>     <rt>じ</rt>
>     <rp>) </rp>
>
>     <rp> (</rp>
>
>     <rtc><rt>Kanji</rt></rtc>
>
>     <rp>) </rp>
>     </ruby>
>
> We may have two conditions, both seem fairly appropriate and rational to me,
>
>     漢かん字じKanji
>
> Or,
>
>     漢字かんじKanji

My instinct on copy-pasting of ruby markup is that it has to depend on 
the data type of the clipboard. If the data is HTML (or in any way 
document-like in such a way that it supports ruby) then the ruby 
structure must be preserved. That can, of course, be somewhat 
complicated if the user has selected just part of a base, but overall I 
think it can be figured out at least for the more common cases.

But for *raw text* stored in the clipboard, then what ought to be stored 
is the rendering taking <rp> into account. So you'd get:

漢字 (かんじ) (Kanji)

Which I reckon is sensible for plain text. (I'm happy to be proven wrong 
though. :) Clipboard operations is in fact one of the reasons why I 
think that <rp> is not just for transitions but is more perennially useful.

Thanks!

PS: I'm heading off tonight for the Christmas break so I probably won't 
reply to this thread before 2014 — sorry about that.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Monday, 23 December 2013 11:35:45 UTC