Re: HTML5 and ruby from Leif Halvard Silli on 2012-01-13 (public-i18n-cjk@w3.org from January to March 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 13 Jan 2012 02:03:39 +0100
To: public-i18n-cjk@w3.org
Message-ID: <20120113020339972212.54122281@xn--mlform-iua.no>
Eric Muller Thu, 12 Jan 2012 12:46:50 -0800

> Even in the case of c., the issue from the point of view of document 
> content (i.e. ignoring for one second the application of styling), is to 
> represent a list of pairs {base text, ruby text}. Both
> 
> <ruby><rb>東</rb><rt>とう</rt><rb> 京< /rb><rt>きょう</rt></ruby> (may 
> be with a different interleaving of rbs and rts)
> 
> and
> 
> <ruby>東<rt>とう</rt>京<rt>きょう</rt>< /ruby>
> 
> capture the list of pairs {東, とう}, {京, きょう} equally well.

Eric: With regard to your two examples above, then *none* of them are 
part of the XHTML Ruby Module, AFAICT. See 
<http://www.w3.org/TR/ruby/>. Therefore, I have a single question about 
both of them: Why is *any* of the two examples above any better than 
this:

<ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb> 京< /rb><rt>きょう
</rt></ruby>
or this:
<ruby>東<rt>とう</rt></ruby><ruby>京<rt>きょう</rt></ruby>

As far as I can tell, these two variants each picks up the list of 
pairs equally well too, not? What is the purpose - the progress - of 
using a single <ruby> rather than two <ruby>s? 

You see, this is - I believe - where HTML5 deviates from the XHTML Ruby 
module. The omission of <rb> is already catered for in the XHTML Ruby 
module: It is considered non-conforming, but it is described how to 
handle it. For instance XHTML Ruby module has this, 'simple ruby' 
<http://www.w3.org/TR/ruby/#simple-ruby1>:

<ruby>
  <rb>WWW</rb>
  <rp>(</rp><rt>World Wide Web</rt><rp>)</rp>
</ruby>

Per the HTML5 model, one could do this:

<ruby>
  <rb>W</rb>
  <rp>(</rp><rt>World</rt><rp>)</rp>
  <rb>W</rb>
  <rp>(</rp><rt>Wide</rt><rp>)</rp>
  <rb>W</rb>
  <rp>(</rp><rt>Web</rt><rp>)</rp>
</ruby>

Which in a text browser would look like this:

  W (World) W (Wide) W (Web)

Is this any useful? Instead, per the XHTML Ruby module's complex 
markup, one could do this:

<ruby>
  <rbc>
    <rb>W</rb><rb>W</rb><rb>W</rb>
  </rbc>
  <rtc>
    <rt>World</rt><rt>Wide</rt><rt>Web</rt>
  </rtc>
</ruby>

Which in a Text browser could render:

  WWW World Wide Web

If I understood Richard's Wiki page correctly (I'm assuming there was a 
typo - see my previous reply), then it suggested this option:

<ruby>
    <rb>W</rb><rb>W</rb><rb>W</rb>
    <rp>(</rp><rt>World</rt><rt>Wide</rt><rt>Web</rt><rp>)</rp>
</ruby>

Which in a text browser could look like this:

  WWW (World Wide Web)

> Both approaches  work, but requiring <rb> makes it slightly easier to 
> manipulate documents; to access a base text, one can simply grab the 
> <rb> element, instead of grabbing all the elements other than <rt>. (In 
> XSLT, group-adjacent="if (self:rt) then 'rt' else 'basetext'" does the 
> trick, but works only in a for-each-group if I am not mistaken, not on 
> direct access to the nth base text).

Agreed!

> I would not characterize approach 3 (in section 2) as an alternative to 
> 1 and 2. It is available to authors under 1, but it does not help 
> consumers (unless the <span> is required, at which point that <span> is 
> just another name for <rb>). From the point of view of consumers, it's 
> really the same as approach 1, used in a restricted way.

In which document did you find these 'approaches'? The Wiki page? URLs 
would be handy, then, please ...

But I agree that <span> would be just a another name for <rb>. I don't 
get why the HTML5 editor is so hung up in <rb>. I think it must be 
based on the fact that IE6/7/8 doesn't understand <rb>. However, that 
probablem can easily be dealt with (via JavaScript), and  IE9 does 
support <rb> - the same way it supports <span>.

> It seems to me that approach 4 introduces a new selector mechanism, and 
> I don't think that's desirable.
> 
> One question which is more apparent from my a/b/c organization is 
> whether b should have a different DOM than c. As far as I can tell, b is 
> just a succession of single ruby, and there is therefore no strict need 
> to represent that situation by a single <ruby> element.  Allowing b to 
> be done by a single <ruby> element with multiple pairs, as a convenience 
> to authors, means the same DOM as for a jukugo ruby (I believe this is 
> what motivated your approach 2 in "4 jukugo ruby", as well as your 
> discussion of fallback). If that convenience is offered, then one will 
> have to have something in CSS to express b. vs. c, and rendering engines 
> will have to consult that even when doing fallback, to determine whether 
> to do 東(とう)京(きょう) or 東京(とうきょう).

I think that a different DOM is very difficult. In the bug report about 
inclusion of <rb>, there were many points about the similarity of <dl> 
and <ruby>. As we know, in <dl>, then the DOM is quite "normal". But of 
course I agree with the problem. And I think the problem should be 
solved by doing/allowing what I think Richard discussed:

<ruby>
    <rb>W</rb><rb>W</rb><rb>W</rb>
    <rp>(</rp><rt>World</rt><rt>Wide</rt><rt>Web</rt><rp>)</rp>
</ruby>

We could also do this, where the <rbc> would come in handy

<ruby>
    <rbc><rb>W</rb><rb>W</rb><rb>W</rb></rbc>
    <rp>(</rp><rt>World</rt><rt>Wide</rt><rt>Web</rt><rp>)</rp>
</ruby>

But, if we want to be Webkit and Firefox compatible, then we could not 
do this:

<ruby>
    <rbc><rb>W</rb><rb>W</rb><rb>W</rb></rbc>
    
<rtc><rp>(</rp><rt>World</rt><rt>Wide</rt><rt>Web</rt><rp>)</rp></rtc>
</ruby>

Why? Because Firefox 4/5/6/7/8/9 and Webkit (since Safari 5) will 
auto-close the current element, when the parser sees <rp> or <rt>. At 
least those browsers would need to change, if we were to include <rtc>. 
(Actually, even if they currently do this, it can still be useful t 
include rtc{} as a CSS hook. (More on this later.)

> I don't know whether 
> Japanese users view b. and c. as just different styling or as 
> semantically different. The former permits b. to be represented by a 
> single <ruby> and to make the distinction in CSS. The later either 
> requires b. to be done by multiple <ruby> or something  additional in 
> HTML if one want to do b. with a single <ruby>.
> 
> Seems to me that mandatory <rb> makes life easier, and IMO easier enough 
> that it's justified, but is not strictly necessary.
> 
> A decidedly inferior scenario, is to make <rb> optional. A <span> does 
> just as well in this case.

I agree slightly with the view that <rb> should have been obligatory. 
But even if it is optional, an authoring tool could treat <rb> as 
obligatory - it could autoinsert it. When it comes to <span>, then why 
not <b>? This would have to be decided case by case. THus there is 
definitely an advantage to permitting <rb>, anyhow.
-- 
Leif H Silli
Received on Friday, 13 January 2012 01:06:48 UTC