Re: HTML5 and ruby from Eric Muller on 2012-01-12 (public-i18n-cjk@w3.org from January to March 2012)

From: Eric Muller <emuller@adobe.com>
Date: Thu, 12 Jan 2012 12:46:50 -0800
To: <public-i18n-cjk@w3.org>
Message-ID: <4F0F46BA.9070601@adobe.com>
An opinion.

I tend to slice the varieties of ruby situations in Japanese a bit 
differently.

a. single ruby, one base text, one ruby text, without (mono) or with 
(group) multiple characters in the base text

b. succession of single ruby that cannot merge

c. succession of single ruby that can "merge" (jukugo ruby, with the 
typographic result either as described in JLREQ section 3.3.7 or the 
more complex one described in appendix F.)

Even in the case of c., the issue from the point of view of document 
content (i.e. ignoring for one second the application of styling), is to 
represent a list of pairs {base text, ruby text}. Both

<ruby><rb>東</rb><rt>とう</rt><rb> 京< /rb><rt>きょう</rt></ruby> (may 
be with a different interleaving of rbs and rts)

and

<ruby>東<rt>とう</rt>京<rt>きょう</rt>< /ruby>

capture the list of pairs {東, とう}, {京, きょう} equally well.

Both approaches  work, but requiring <rb> makes it slightly easier to 
manipulate documents; to access a base text, one can simply grab the 
<rb> element, instead of grabbing all the elements other than <rt>. (In 
XSLT, group-adjacent="if (self:rt) then 'rt' else 'basetext'" does the 
trick, but works only in a for-each-group if I am not mistaken, not on 
direct access to the nth base text).

I would not characterize approach 3 (in section 2) as an alternative to 
1 and 2. It is available to authors under 1, but it does not help 
consumers (unless the <span> is required, at which point that <span> is 
just another name for <rb>). From the point of view of consumers, it's 
really the same as approach 1, used in a restricted way.

It seems to me that approach 4 introduces a new selector mechanism, and 
I don't think that's desirable.

One question which is more apparent from my a/b/c organization is 
whether b should have a different DOM than c. As far as I can tell, b is 
just a succession of single ruby, and there is therefore no strict need 
to represent that situation by a single <ruby> element.  Allowing b to 
be done by a single <ruby> element with multiple pairs, as a convenience 
to authors, means the same DOM as for a jukugo ruby (I believe this is 
what motivated your approach 2 in "4 jukugo ruby", as well as your 
discussion of fallback). If that convenience is offered, then one will 
have to have something in CSS to express b. vs. c, and rendering engines 
will have to consult that even when doing fallback, to determine whether 
to do 東(とう)京(きょう) or 東京(とうきょう). I don't know whether 
Japanese users view b. and c. as just different styling or as 
semantically different. The former permits b. to be represented by a 
single <ruby> and to make the distinction in CSS. The later either 
requires b. to be done by multiple <ruby> or something  additional in 
HTML if one want to do b. with a single <ruby>.


Seems to me that mandatory <rb> makes life easier, and IMO easier enough 
that it's justified, but is not strictly necessary.

A decidedly inferior scenario, is to make <rb> optional. A <span> does 
just as well in this case.

Eric.
Received on Thursday, 12 January 2012 20:47:24 UTC