Re: Summary of I18N discussion in HTML WG today

On 11/9/2012 7:20 PM, "Martin J. Dürst" wrote:
> Hello Fantasai, others,
>
> Just read through your mail, and very much agree.

Excellent summary.

A./
>
> Regards,   Martin.
>
> On 2012/11/10 3:16, fantasai wrote:
>> On 11/09/2012 07:42 AM, Richard Ishida wrote:
>>> On 05/11/2012 14:09, Bruce Lawson wrote:
>>>> On Fri, 02 Nov 2012 10:45:36 -0000, Robin Berjon <robin@w3.org> wrote:
>>>>
>>>>>
>>>>> ### Forward-looking ruby model
>>>>> Fantasai exposed a set of issues with the current ruby markup that
>>>>> make it awkward to extend in future for features that we have good
>>>>> reasons to believe will become increasingly common as HTML is used 
>>>>> for
>>>>> books, scientific publishing, and pretty much everything in the world
>>>>> in general. These involve jukugo ruby, fallback, double-sided ruby.
>>>>
>>>> is this set of issues written up anywhere?
>>>
>>>
>>> Bruce, see http://www.w3.org/TR/ruby-use-cases/. Fantasai also wrote
>>> something in a blog post that I tried to represent in the
>>> aforementioned doc.
>>
>> Here's the blog post:
>> http://fantasai.inkedblade.net/weblog/2011/ruby/
>>
>> A key point that's not in the blog post is that there are two 
>> fundamentally
>> different models for doing ruby:
>>
>> row-based model
>> This is the XHTML Ruby approach, where all the base text is given,
>> followed by all the annotations, row by row.
>>
>> column-based model
>> This is the HTML Ruby approach, where each base is given followed
>> immediately by its annotation(s), column by column.
>>
>> The column-based model has several flaws:
>>
>> 1. It doesn't handle inlining gracefully. As an example, the word
>> Tokyo is written 東京 in kanji and とうきょう in kana. The base-text
>> pairs are 東-とう 京-きょう, and the ruby markup must create those
>> associations accordingly. However, when rendered inline, the
>> correct rendering is
>> 東京(とうきょう)
>> with the word kept together as one unit, not
>> 東(とう)京(きょう)
>>
>> There are various use cases for inlining:
>> * fallback, for implementations that don't support ruby.
>> * compacting the layout, because ruby requires higher inter-line
>> spacing. (If ruby is rare enough in the document, it's more
>> efficient to present it inline, and this has been a desired
>> option on phones.)
>> * small fonts -- in order to fit above the base text, ruby is
>> typically written about half as small as the base text. If
>> the base font size is too small it can become unreadable,
>> especially for older people. Inlined annotations on the other
>> hand are the same size as the base text.
>>
>> The author and the UA should have the choice of proper inlining
>> without changing the markup. Doing that with the current markup
>> requires special box-reordering support in the layout engine,
>> which is doable but not trivial and certainly does not solve the
>> fallback use case.
>>
>> 2. It doesn't handle spanning gracefully, i.e. the case where there
>> are multiple annotations and their boundaries don't line up.
>> See http://fantasai.inkedblade.net/weblog/2011/ruby/#double for
>> examples.
>>
>> Hixie recently added the ability to do two types of double-sided
>> ruby to try to address this use case, but used completely different
>> markup models: one case would be done with nested <ruby> tags, and
>> the other with multiple adjacent <rt> elements. The problem with
>> this is that
>> * it forces the author to learn (and style) two very different
>> markup models for things that are fundamentally the same
>> * it forces the UA to implement two very different layout models
>> for things that are fundamentally the same
>>
>> One of the complexities of ruby layout that is overlooked is that
>> adjacent ruby on a single line need to negotiate space from each
>> other. In the simple case, they are black boxes of a particular
>> size: if the annotation text is wider than the base text, the
>> inline is treated as having the size of its annotation. But this
>> is not always the desired rendering. In many cases it's desired
>> for a long annotation to overhang adjacent text *if that text is
>> not itself annotated* and there is therefore sufficient room for
>> the overhang. So inline layout needs to negotiate space for
>> annotations among ruby structures on the same line, across inline
>> element boundaries, etc.
>>
>> Another of course is negotiating line-breaks within the ruby among
>> the base text and its annotations.
>>
>> So not only does this approach require the author to learn two
>> different models, it also requires the layout engine to implement
>> two different models and handle their interactions.
>>
>> Personally, I don't see why we are insisting on this approach when
>> there is a sensible alternative that puts all forms of ruby on the
>> same track and allows for whatever extensions we might want from
>> now through 2025 to be handled within the same basic architecture.
>>
>> Note, I'm not advocating that the current model for single-sided ruby,
>> which is implemented in WebKit and Trident already, should be abandoned.
>> It's fairly easy to incorporate that into a box model that extends it
>> into a row-based system. I'm saying we shouldn't shoehorn additional
>> requirements into that model as hixie has done, dropping some of them
>> on the floor as necessary, but instead extend in the direction of a
>> model that satisfies the all requirements with a single unified model.
>> I think this is less complex and more satisfying than the current
>> approach.
>>
>> ~fantasai
>>
>>
>
>

Received on Saturday, 10 November 2012 06:33:24 UTC