RE: Ruby in HTML5

> From: Roland Steiner [mailto:rolandsteiner@google.com]
> Sent: 12 March 2010 10:28
> To: MURAKAMI Shinyu; Takuya Oikawa
> Cc: Suzumizaki-Kimikata; Jeroen Ruigrok van der Werven; Richard Ishida;
> www-international@w3.org; public-html-ig-jp@w3.org
> Subject: Re: Ruby in HTML5
>
> Let me add a few thoughts as the author of the current WebKit
> implementation
> of ruby, which based on the HTML5 spec. Now, that implementation has
> much
> room for improvement, so any discussion on how to improve and - more
> importantly - converge the standards is very useful, and much needed IMHO.
> The following is all from an implementation point of view. I have also
> raised some of the points below in another discussion "Ruby proposal for
> XSL
> 2.0" in the www-style mailing list.
>
> o) Mono, Jukugo and Group ruby
>
>  HTML5 syntax would lend itself very easily to all of them, without any
> additions:
>
>     Mono ruby:   <ruby> 東 <rt> とう </rt></ruby><ruby> 京 <rt> きょう
> </rt></ruby>
>     Group ruby:   <ruby> 東京 <rt> とうきょう </rt></ruby>
>     Jukugo ruby:   <ruby> 東 <rt> とう </rt> 京 <rt> きょう </rt></ruby>


I don't think it is clear from the HTML5 spec that this is specifically for
jukugo ruby. I see it as a way to express ruby with less markup across a
number of adjacent base characters, whether it be mono, group or jukugo
ruby.


>
>
> o) Complex ruby, nested ruby:
>
> As I see it, there are several "levels" of complex ruby support:
>
> 1.) allow the (single) ruby text to be positioned below the base

This is simple ruby, as per the terminology used in the Ruby Annotation
specification, not complex.  I fear it muddies the waters to call it such.


> 2.) allow 2 ruby texts, both above and below the base at the same time
> 3.) allow multiple ruby texts above and below the base (not in any spec,
but
> raised in this discussion)
> 4.) allow ruby texts both above and below the base, allow different
spanning
> of those texts relative to ruby base elements


I believe you'll find that approach 4 is quite common when dealing with ruby
in Japanese that has text on both sides of the base.  Most of the examples i
have seen have used mono-ruby 'before' and group ruby 'after', typically
because the ruby before is  a phonetic character-by-character description,
whereas the ruby after is not describing the sounds character-by-character,
but rather applies to the word or phrase as a whole.

The upshot of this is that you need a span capability for most instances of
ruby text on both sides of ruby base, since the ruby before has to be
aligned on a character by character basis, whereas that below spans all the
characters.  (The same applies, btw, to a mixture of jukugo ruby and group
ruby.)

I wished for a long while that the Ruby Annotation spec would to allow for 3
conformance levels, simple ruby, ruby on both sides, and full complex ruby,
to make it easier for implementors.  But now i realise that the spans that
complex ruby provides are actually needed most of the time when ruby text
appears on both sides.

This has an impact on the rest of your comments.


> The first should be rather easy and straight-forward to implement by
adding
> 'ruby-position: after'.
>
> The 2nd could be handled by nested ruby. But while I think that nested
ruby
> _should_ work in a browser from a technical point of view (as in "why
> not?"), I don't think it should be the preferred solution for this.

I agree.

I also think that it is important to support the XHTML model, since it is
included already in XHTML 1.1, and people using XHTML 1.1 should eventually
be able to migrate to the XHTML serialisation of HTML5.  (That doesn't mean
that the innovations in the current HTML5 spec should be removed.)

>
> On the other hand, complex ruby as described in the current CSS3/XHTML
> spec
> addresses the 2nd and 4th item, but is IMHO too technical a solution. It
> requires too much markup for something that conceptually is rather simple.
> It has much of the complexities of tables, while also being subtly
> different. Due to the way tags are nested, <rp> elements cannot be used in
> complex ruby, which is counter-intuitive and WILL lead to mistakes (see,
> e.g.,
http://www.crosswire.org/pipermail/sword-devel/2008-July/028644.html).
>
> As mentioned in the Bugzilla thread
> https://bugzilla.mozilla.org/show_bug.cgi?id=256274, esp. comment #7, the
> current complex ruby spec also lacks specifics for error handling. The
> Bugzilla thread mentions illegal or non-text-flow elements within the
> various ruby containers. Another problem would be how to treat 'ruby-span'
> when the value is too large, or is being dynamically changed by
JavaScript.
> The spec also does not address cases that may well occur in HTML outside
> of
> simple text layout, e.g., handling 'ruby-overhang' when the adjacent
> elements are of different sizes, positioned, animated, etc. (Indeed I
> believe that 'ruby-overhang', while typographically very nice, will prove
to
> be the hardest part to implement properly).

The spec may lack some specifics, but surely that's one reason we're writing
the HTML5 specification - to more clearly specify things where needed?  If
you mean that the markup definition prevents reasonable behaviour, that's a
different thing - but you should provide specific examples of that.

>
> In general, the use cases put forward in this thread seem to mainly ask
for
> the first 2 items in above list. There does not seem to be a real need for
> item 3., although I believe any implementation that properly supports item
> 2. should rather easily be extensible to also support item 3., but that
> could be mistaken. Even then, this could really be a case to use nested
> ruby. Item 4. is addressed by the complex ruby spec, and should likewise
be
> extensible to include item 3. However, both item 3. and 4. seem to me
rather
> academic enough to consider foregoing them if it can result in simpler
> markup instead.
>
> In summary, I personally would rather prefer a discussion on how the HTML5
> and CSS specs could be converged without introducing unnecessary
> complexities.


I no longer believe that 4 is academic.  See above.

I'd like to note that I've had a good implementation of complex ruby running
on my firefox browser for some years now, using an add-on.  It does both
sides, with spans.  I therefore find myself inclined to wonder whether it's
really so difficult to implement.


>
>
> o) Ruby properties
>
> As also suggested in the XSL thread, some of the ruby properties should be
> reconsidered:
>
> 'ruby-align' is a combination of several largely orthogonal parts, and
> consequently should be broken up into several properties that handle
> alignment, edge handling and spacing. See also
> http://www.w3.org/Style/XSL/Group/FO/wiki/Ruby#Treat_CSS3_.22ruby-
> align.22_A
> s_Shorthand.3F

I think line end handling may well be orthogonal, but I'm not convinced yet
that the others are.


>
> Furthermore, properties for character spacing and -transformation (narrow
> Katakana, changing small subscript Kana to standard-size for better
> legibility) are not ruby-specific and would IMHO better be handled in a
> separate general text module. Character spacing also needs to address how
> to
> handle non-Kanji/Kana characters.

I think that is addressed in the CSS3 module already, isn't it?


>
> As everybody seems to agree, Bopomofo/Zhuyin-Fuhao is right out and will
> stay so until vertical text is properly supported by UAs.

You need more than just vertical text handling to support Bopomofo
right-sided vertical ruby.  See my notes in the latest CSS3 editor's draft.
http://dev.w3.org/csswg/css3-ruby/#rubypos



I hope those comments help the discussion.  The key point I'm trying to make
is that any implementation that supports ruby text on both sides of the base
text will typically need to enable spans.  That is the thing that the
current HTML5 spec cannot support afaict, but that the XHTML/Ruby
Annotations do.

In fact the current HTML5 doesn't support ruby text on both sides at all,
since it says that the content model for ruby is " One or more groups of:
phrasing content followed either by a single rt  element, or an rp  element,
an rt  element, and another rp  element." (note the word 'single').  It
could perhaps be extended to support <ruby>X<rt>y</rt><rt>z</rt></ruby>, but
it doesn't support it at the moment.

RI

Received on Tuesday, 30 March 2010 20:04:14 UTC