[Bug 10830] i18n comment : Please add support for rb from bugzilla@jessica.w3.org on 2011-11-30 (public-i18n-core@w3.org from October to December 2011)

From: <bugzilla@jessica.w3.org>
Date: Wed, 30 Nov 2011 06:18:32 +0000
To: public-i18n-core@w3.org
Message-Id: <E1RVdV2-0004B8-0c@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10830

--- Comment #70 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-11-30 06:17:56 UTC ---
(In reply to comment #68)
> (In reply to comment #62)

> The <rb> is not styled and the page would render *exactly* the same if the
> <rb>s were simply omitted. But they don't need to be omitted, because if
> browsers continue to ignore <rb> as they do now and as the spec requires, the
> page will *also* continue to work exactly the same.

You express yourself in a confusing way when you claim that HTML5-compatible
UAs ignore <rb>: In fact, elsewhere you say it is  treated like <xxx>.  (It is
only in IE6/7/8 that there is no native support even for unknown elemetns.)


> This bug in no way negatively affects Yomiuri Online. In fact the spec as it
> stands now would slightly positively affect them, by making it possible to
> strip the elements from their markup and thus saving some bandwidth.

That <rb> becomes included in HTML5 in no way negatively affects this slightly
positive side: You are only asked to add <rb> to HTML5 as an optional element. 

In fact, if <rb> is permitted to be unclosed (because it anyway gets autoclosed
when it sees <rt> or <rp> - as suggested by Simon and as implemented by Firefox
and Webkit), then - for those cases when there *is* a need for a ruby base text
element, then <rb> would allow you to save more bandwith than if you were to
use <span> - which (according to the authoring conformance rules) requires that
you close it manually.)


> > Automatic ruby programs
> > http://mt.adaptive-techs.com/httpadaptor/servlet/HttpAdaptor?.h0.=fp&.ui.=trial&.up.=&.ro.=kh&.st.=rb
> 
> The only thing this page does with <rb> is style it to not render as native
> ruby!
>
> > http://www.hiragana.jp/
> 
> Again, the only style applied to <rb> here is to not style it as native ruby.

Both the above sites use inline table CSS in order to *make* ruby markup work
in legacy user agents. And on adaptive-techs.com it works as intended. While on
hiragana.jp, the styling seemingly makes things fall apart.

But you make it sound like the problems on www.hiragana.jp, is  linked to its
styling of <rb>. However, this is a false insinuation. There is only a problem
with the *overall* styling which that site uses in order (which it uses to fix
the lacking ruby support in legacy browsers): Removing the <rb> *tags* without
also removing the CSS that tries to fix the legacy styling, does not not cure
the site's problem. 

Thus it is futile to single out the styling of <rb> as a problem - the real
problem is lack of support for ruby markup in legacy browsers in combination
with lack of update of the site in face of today's much improved support for
ruby markup (in Firefox and Safari/Chrome).  


> For both the adaptive-techs.com site and the hiragana.jp site, this bug would
> have no effect whatsoever on the sites. They would continue to work as they do
> today whether we added <rb> or not. The sites right now are simply relying on
> the element being treated as an unknown element in browsers.

When the site launched, it probably relied on *all* the ruby markup elements
being treated as unknown elements. Today there is, to various degree, native
support for ruby markup in Firefox, Webkit and IE.

So the fact that the site relies on how unknown elements are treated, seems
like no argument the one way or the other. If anything, it means that it is
technically rather uncomplicated to add <rb> to HTML5.


> > Koji Ishii and I believe that the rb tag is widely implemented.
> 
> You are wrong. Browsers all treat <rb> the same as <xxx>. [ snip ]

How UAs treat <rb> an <xxx> - and ruby markup in general, can be seen on this
testcase:

 http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1264

Results:

(1)  IE6/7/8 treat <rb> like it treats the imaginative, unknown <xxx> element.
**However**,  IE6/7/8 does not support unknown elements the way HTML5 requires
them to be supported, so it is not comparable to other browsers. (But one can
enable HTML5 'unknown element' parsing via the HTML5 shiv method, in which case
IE6/7/8 treats it like current Opera and IE9.) IE6/7/8 apply ruby CSS to
<ruby>, <rp> & <rt> but don't support ruby parsing (autoclosing when it sees
<rt> or <rp>), see (3) and (4) below.

(2)  Opera and IE9 treat <rb> as an unkonwn element. IE9 applies ruby CSS to
<ruby>, <rp> & <rt> but doesn't support ruby parsing (see (3) and (4) below).
Opera doesn't apply whether ruby parsing or ruby CSS.

(3)  Firefox doesn't apply any styling to <ruby> markup, **but** it does apply
ruby *parsing*: if you forget to close the <rb> element (or if you use <span>
instead and forget to close it) then the element will be closed once the parser
sees the <rp> or <rt> element. Firefox seems to apply ruby parsing since at
least version 3.

(4) Webkit (Safari/Chrome) apply ruby parsing, like Firefox does. In addition
it applies ruby CSS.


> > This point appears to be subjective
> 
> It's not subjective. You can list the use cases quite easily, and then see if
> you need <rb> to do them. You'll find you don't. I've demonstrated this
> numerous times in this bug already.


Actually, you will find that there are good usecases for <rb>:


   (A)   *When* you want or need to mark up the base text with an element
(rather than just placing it direclty inside <ruby>), *then* there is a need
for an element that identifies the term that is being translated/rubified as
such a thing, so that when you - or someone else - at a later point look up the
sourcecode of the page again, you know that the <rb> is there for the purpose
of identifing the translated word. Alternatively, if the page were to use
<span> instead, then it is hard to know for what purpose <span> was used: Was
it for the purpose of working aroudn a browser bug, or did it actually play an
important role in the ruby "microformat"? With <rb> in place, there is no
question about why it is there.

   (B)   To facilitate simplifed authoring based on ruby parsing:
          An unclosed elements inside <ruby> gets "autoclosed" when the parser
sees <rt> or <rp> - this is the case in Firefox and Webkit, and this is what we
want the HTML5 parser to do as well [I did not check if HTML5 already says that
it shoudl work that way]. Such parsing allows us to say that authors do not
need to close the <rb> element - they can rely on automatic closing. For a
"new" element, like <rb>, we can without too much trouble introduce the rule
that authors *may* skip explicitly closing it with the closing tag. But if
authors were required to use <span>, then we would have to abide with the
general autoring rule for <span>, which says that the closing tag is
obligatory.  (Thus, as noted above, <rb> would save us a tiny bit of bandwith
compared to using <span> or even <i> or <b>.)

   (C)   Depending on the script used to write the language, there can be a
technical need for identifying the language - and the script - of the
translated word via the use of a language tag, and without letting the language
inheritance rules affect the rest of the content of the <ruby> element. This
can be achieved the simplest by by placing the @lang attribute on the <rb>
element as opposed to placing it on the very <ruby> element. In fact, it is
*core* to the idea of ruby mark-up that base text  **differs** from annoation
text - with regard to language and/or script . And in order to operate with
such a separation, then both <rb> and <rt> is needed, so that you can place the
@lang attribute on each of them separately rather than on the very <ruby>
element, which would affect both base text and annotation text.

         Examples - (C):

         1) <ruby lang="foo"> as opposed to <rb lang="foo"> can cause the font
styling associated with language "foo" to be applied to both base text and
annotation text also when base text and annotation text should rather have
different fonts due to use of different writing scripts. This, I'm told, is the
case for Firefox.

         2) The lack of <rb> represents a temptation for authors to rely solely
on <ruby> and <rt>, which in turn invites to setting the language on <ruby>
without overriding the inherited language with a @lang on <rt> too. This could
badly affect which language e.g. screenreaders and spell checkers considers the
language, script etc of to the content to be. The permission to use <rb> means,
in contrast, that there is a natural way to set the language of the base text.

<body lang=nn >Obama, president i
     <ruby lang=en >
         USA
        <rt>Sambandstatane</rt>
     </ruby>
 <!--Above the language of <rt> is set to English due to the use of @lang on
the <ruby> element. Whereas in the following element, in the same document, its
language is as it should be - nn (Norwegian Nynorsk): -->
     <ruby>
        <rb lang=en> USA
        <rt>Sambandstatane</rt>
     </ruby>
</body>

         3) The content of <rt> will often be in the same language as the text
surrounding the <ruby> element. (This because <rt> is often used to explain the
content of the <rb>.)  And in that case, if you add the @lang to the <rb>
element, you are set - there is nothing more to do: The <rt> inherits the
language from the surrounding text:

<body lang=en >Learn some Mandarin:
     <ruby>
        <rb  lang=cmn >&#x6c49;
        <rt>Chinese</rt>
        <rb  lang=cmn >&#x5b57; 
        <rt>character</rt><!--
<rt> inherits language from <body lang=en > =
Thus, authors don't need to restate that <rt> is in English.
 --></ruby>
</body>

SUMMARY:

To *not* allow <rb> means that HTML5 ruby mark-up primarily would cater for the
cases when the **base text** should have the same language as the parent
element of <ruby>, but not as elegantly for the cases when the **annotation
text** should have the same language tag classification as the parent element
of the <ruby> have. 

This means that HTML5 ruby mark-up would cater relatively well only for the
"Classical" usecase in Chinese or Japanese text, where the "ruby base text
character" occurs inside a text of the same language and where <rt> is used to
offer an reading/pronounciation aid - <rt> can the be language tagged as
needed, while base text would automatically get correct language.  As an
example, here is an <ruby> example inside a Chinese document, where it is used
in order to  transcribe the  Chinese characters with the bopomofo phonetic
script:

<body lang="cmn">&#x6c49;:
     <ruby>
         &#x6c49;
        <rt lang=cmn-Bopo >&#x310f;&#x3122;&#x2cb;</rt>
         &#x5b57;
         <rt lang=cmn-Bopo >&#x3117;&#x2cb;</rt>
     <!--<ruby> inherits language from <body lang=zh > =
     Thus, authors don't need to restate that it is Chinese.
     The <rt> is in same language but different script.-->
     </ruby>
</body>

In a sentence: HTML5 ruby markup without the <rb> element renders HTML5 ruby
easy to use for same language annotations, but makes it more difficult to use
for annotation of a text segments in another language.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Received on Wednesday, 30 November 2011 06:18:47 UTC