Re: Summary of I18N discussion in HTML WG today from Martin J. Dürst on 2012-11-10 (www-international@w3.org from October to December 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sat, 10 Nov 2012 12:20:37 +0900
To: fantasai <fantasai.lists@inkedblade.net>
CC: Richard Ishida <ishida@w3.org>, Bruce Lawson <brucel@opera.com>, public-html@w3.org, www International <www-international@w3.org>
Message-ID: <509DC805.20408@it.aoyama.ac.jp>
Hello Fantasai, others,

Just read through your mail, and very much agree.

Regards,   Martin.

On 2012/11/10 3:16, fantasai wrote:
> On 11/09/2012 07:42 AM, Richard Ishida wrote:
>> On 05/11/2012 14:09, Bruce Lawson wrote:
>>> On Fri, 02 Nov 2012 10:45:36 -0000, Robin Berjon <robin@w3.org> wrote:
>>>
>>>>
>>>> ### Forward-looking ruby model
>>>> Fantasai exposed a set of issues with the current ruby markup that
>>>> make it awkward to extend in future for features that we have good
>>>> reasons to believe will become increasingly common as HTML is used for
>>>> books, scientific publishing, and pretty much everything in the world
>>>> in general. These involve jukugo ruby, fallback, double-sided ruby.
>>>
>>> is this set of issues written up anywhere?
>>
>>
>> Bruce, see http://www.w3.org/TR/ruby-use-cases/. Fantasai also wrote
>> something in a blog post that I tried to represent in the
>> aforementioned doc.
>
> Here's the blog post:
> http://fantasai.inkedblade.net/weblog/2011/ruby/
>
> A key point that's not in the blog post is that there are two fundamentally
> different models for doing ruby:
>
> row-based model
> This is the XHTML Ruby approach, where all the base text is given,
> followed by all the annotations, row by row.
>
> column-based model
> This is the HTML Ruby approach, where each base is given followed
> immediately by its annotation(s), column by column.
>
> The column-based model has several flaws:
>
> 1. It doesn't handle inlining gracefully. As an example, the word
> Tokyo is written 東京 in kanji and とうきょう in kana. The base-text
> pairs are 東-とう 京-きょう, and the ruby markup must create those
> associations accordingly. However, when rendered inline, the
> correct rendering is
> 東京(とうきょう)
> with the word kept together as one unit, not
> 東(とう)京(きょう)
>
> There are various use cases for inlining:
> * fallback, for implementations that don't support ruby.
> * compacting the layout, because ruby requires higher inter-line
> spacing. (If ruby is rare enough in the document, it's more
> efficient to present it inline, and this has been a desired
> option on phones.)
> * small fonts -- in order to fit above the base text, ruby is
> typically written about half as small as the base text. If
> the base font size is too small it can become unreadable,
> especially for older people. Inlined annotations on the other
> hand are the same size as the base text.
>
> The author and the UA should have the choice of proper inlining
> without changing the markup. Doing that with the current markup
> requires special box-reordering support in the layout engine,
> which is doable but not trivial and certainly does not solve the
> fallback use case.
>
> 2. It doesn't handle spanning gracefully, i.e. the case where there
> are multiple annotations and their boundaries don't line up.
> See http://fantasai.inkedblade.net/weblog/2011/ruby/#double for
> examples.
>
> Hixie recently added the ability to do two types of double-sided
> ruby to try to address this use case, but used completely different
> markup models: one case would be done with nested <ruby> tags, and
> the other with multiple adjacent <rt> elements. The problem with
> this is that
> * it forces the author to learn (and style) two very different
> markup models for things that are fundamentally the same
> * it forces the UA to implement two very different layout models
> for things that are fundamentally the same
>
> One of the complexities of ruby layout that is overlooked is that
> adjacent ruby on a single line need to negotiate space from each
> other. In the simple case, they are black boxes of a particular
> size: if the annotation text is wider than the base text, the
> inline is treated as having the size of its annotation. But this
> is not always the desired rendering. In many cases it's desired
> for a long annotation to overhang adjacent text *if that text is
> not itself annotated* and there is therefore sufficient room for
> the overhang. So inline layout needs to negotiate space for
> annotations among ruby structures on the same line, across inline
> element boundaries, etc.
>
> Another of course is negotiating line-breaks within the ruby among
> the base text and its annotations.
>
> So not only does this approach require the author to learn two
> different models, it also requires the layout engine to implement
> two different models and handle their interactions.
>
> Personally, I don't see why we are insisting on this approach when
> there is a sensible alternative that puts all forms of ruby on the
> same track and allows for whatever extensions we might want from
> now through 2025 to be handled within the same basic architecture.
>
> Note, I'm not advocating that the current model for single-sided ruby,
> which is implemented in WebKit and Trident already, should be abandoned.
> It's fairly easy to incorporate that into a box model that extends it
> into a row-based system. I'm saying we shouldn't shoehorn additional
> requirements into that model as hixie has done, dropping some of them
> on the floor as necessary, but instead extend in the direction of a
> model that satisfies the all requirements with a single unified model.
> I think this is less complex and more satisfying than the current
> approach.
>
> ~fantasai
>
>
Received on Saturday, 10 November 2012 03:21:18 UTC