Re: HTML5 and ruby from Leif Halvard Silli on 2012-01-19 (public-i18n-cjk@w3.org from January to March 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 19 Jan 2012 08:05:07 +0100
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc: fantasai <fantasai.lists@inkedblade.net>, public-i18n-cjk@w3.org, 'WWW International' <www-international@w3.org>
Message-ID: <20120119080507581988.fd31cf3b@xn--mlform-iua.no>

"Martin J. Dürst", Thu, 19 Jan 2012 10:50:52 +0900:

> You wrote (at the end of your mail):
> 
>> The XHTML Ruby module thus allows spell checkers and screen readers to
>> perceive base word[s] without having to behave as if the<rt>  did not
>> exist. This seems like a feature from the XHTML RUby module that it
>> would be worth keeping.
>>
>> Comments?
> 
> I don't remember the Ruby Annotation spec to say anything about 
> what's a word and what not.

The Ruby Annotation spec does not include <rp> for complex ruby, 
because - it says - it is unclear how to render it in fallback mode. In 
fallback mode, the parentheses would become visible, and parentheses 
are word/compound splitters. And HTML5, as is, allows <rp> for its 
version of complex ruby. So you could end up with this:

<ruby>
  <rb>W</rb><rp>[</rp><rt>World</rt><rp>]</rp>
  <rb>W</rb><rp>[</rp><rt>Wide </rt><rp>]</rp>
  <rb>W</rb><rp>[</rp><rt>Web  </rt><rp>]</rp>
</ruby>

A screen reader would read the above as: W - World, W - Wide, W - Web. 
So it seems to me that Ruby Annotation has some 'built-in' thoughts 
about what's a word and what not.

> A sophisticated spell checker could 
> deduce that WWW is one word in both cases, based on the fact that 
> it's Latin letters and there are no spaces.

I don't think that screen readers and spell checkers should see 
different words based on whether <rp> is present or not or  even based 
on whether ruby is supported by the parser or not. So, I question that 
even sophisticated spell checkers should perceive 'WWW' in the above 
markup.

> A dumb spell checker 
> would not treat it as a word because there's markup in between, again 
> in both cases. On top of that, the main use case for Ruby is East 
> Asian languages, where the concept of a word is somewhat unclear 
> anyway because there are no spaces (except in modern Korean, but then 
> there Hanja aren't very popular these days).
> 
> So I think this is an extremely marginal argument, if any.

Well, it does seem like there at least would be need for some authoring 
advice, based on how dumb and smart parsers eventually treat these 
things ...

> I completely agree with Fantasai and many, many others that accepting 
> <rb> for HTML5 would make things way more straightforward. It also 
> makes significant existing content okay rather than invalid, and it's 
> what the main user community (Japanese) wants.

+1
-- 
Leif Halvard Silli

Received on Thursday, 19 January 2012 07:05:44 UTC