W3C home > Mailing lists > Public > public-i18n-cjk@w3.org > January to March 2012

RE: Planning to update the IncludeRB change proposal [Was: Letter-by-letter (or syllable-by-syllable) (was RE: HTML5 and ruby]

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 23 Jan 2012 12:17:31 +0100
To: Koji Ishii <kojiishi@gluesoft.co.jp>
Cc: Richard Ishida <ishida@w3.org>, CJK discussion <public-i18n-cjk@w3.org>
Message-ID: <20120123121731428198.ffb3da56@xn--mlform-iua.no>
Koji Ishii, Sun, 22 Jan 2012 23:16:37 -0500:
> Thank you for writing this up, Leif.
> 
> While I agree that rb/rb/rt/rt pattern is good, I don't know 
> prohibiting rb/rt/rb/rt is a good idea. What are the motivation and 
> benefits to prohibit that?

The motivation is to solve - or at leaсt minimize - the problems 
related to the difference, the gap, between what the visual, human 
parser sees on screen and what computer parser sees [in the source code 
and in the DOM]. Since authors/designes are often visually orientated, 
this gap may often get unnoticed unless the conformance checker whines 
about it.

> I think it purely depends on how author wants to break words/letters 
> in his mind, and therefore the less constrains, the better, unless 
> there're good reasons.

By making the order option, one would have to leave this issue 
completely up to authoring advice. 

What about mixed order? rb/rt/rb/rb/rb/rt/rt/rt

It strikes me as a bit unclear how to put it into the spec that the 
author can do as he/she wishes, depending on how he perceives it. Does 
the author really know how he/she perceives it until he/she has tested 
the result of the code in a translation service, in find-in-page, in a 
spell checker, a screen reader?

Can you mention an example of when it would be *objectively* wrong to 
do rb/rb/rt/rt?

By, instead, making it a conformance issue, authors are helped to do 
the right thing. 


Note also that if we have

    foo <ruby><rb>W</rb><rb>W</rb><rb>W</rb>
               <rt>World</rt><rt>Wide</rt><rt>Web</rt>
         </ruby> bar

then a find-in-page search for 'foo WWW' will locate 'foo' plus the the 
ruby base above. By which I want to emphasize that find-in-page 
considers the ruby *base* as the text as the text that 'sits on the 
line', together with 'foo' and 'bar'.

Leif Halvard Silli



> Regards,
> Koji
> 
> -----Original Message-----
> From: Leif Halvard Silli [mailto:xn--mlform-iua@målform.no] 
> Sent: Monday, January 23, 2012 12:09 PM
> To: Koji Ishii
> Cc: Richard Ishida; CJK discussion
> Subject: Planning to update the IncludeRB change proposal [Was: 
> Letter-by-letter (or syllable-by-syllable) (was RE: HTML5 and ruby]
> 
> I have updated my IncludeRB Change Proposal [1]. Until now, it mostly 
> focused on word-by-word related issues that support the inclusion of 
> <rb> in HTML5. Namely: compatibility with existing code and tools, 
> ability to identify the ruby base [or base 'word'] via CSS 2.1, 
> problems related to use of ad-hoc wrappers such as <span> and 
> metadata issues related to accessibility, language tagging etc. 
> 
> However, now I have updated the IncludeRB CP to also solve the 
> letter-by-letter related details:
> 
> [A] Letter-by-letter conformance: The IncludeRB proposal now says 
> that there can only be a single adjacent par of <rb><rt> inside a 
> <ruby>. 
> Thus
> 
> NOTE: An alternative solution could be to simply say that <ruby> 
> should not not be used for letter-by-letter ruby unless one also uses 
> <rbc>. 
> 
>     Comments on this detail?
> 
> [B] Complex ruby: <rbc> and <rtc> should be permitted. However Due to 
> the parser differences [2], my CP reinstates only <rbc>. As a matter 
> of fact, in Internet Explorer, then <rbc> creates zero problems - the 
> ruby looks fine, even if you wrap the ruby base inside <rb>. And in 
> Webkit, then the change of content model - [A] - creates a need for 
> <rbc>, as it tends to fall apart otherwise. The introduction of <rbc> 
> does not allow us to create double sided ruby - but at least it 
> allows us to create ruby where the letter-by-letter should be 
> possible to avoid.
>     Example of what the suggestion implies:
> <ruby>
>    <rbc>
>       <rb>W</rb><rb>W</rb><rb>W</rb>
>    </rbc>
>       <rp>[</rp><rt>World</rt><rt>Wide</rt><rt>Web</rt><rp>]</rp
> </ruby>
>     Question: Should <rbc> be obligatory? Or should it be allowed to 
> ommit it? Omitting it works in IE. In my CP, I made it optional.
> 
>     Comments?
> 
> [C] <rtc> can be introduced in HTML6, and for that reason, my CP says 
> that the HTML5 *parser* should be updated to not auto-close the <rtc> 
> elemetn when the parser sees <rt> or <rp>.
> 
> Feedback of any kind is welcome. If you think you can write a better 
> and/or more realistic CP and don't want to cooperate with me in 
> making this one better, then please feel free to 'steal' - but it 
> would be nice if you tell in your CP that you borrowed some ideas.
> 
> [1] http://www.w3.org/html/wg/wiki/IncludeRB	
> [2]
> http://www.w3.org/mid/20120122134024833859.3dc4f444@xn--mlform-iua.no

> --
> leif h silli 
Received on Monday, 23 January 2012 11:18:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 January 2012 11:18:10 GMT