Re: HTML5 and ruby from Leif Halvard Silli on 2012-01-19 (public-i18n-cjk@w3.org from January to March 2012)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 19 Jan 2012 07:44:13 +0100
To: Eric Muller <emuller@adobe.com>
Cc: "public-i18n-cjk@w3.org" <public-i18n-cjk@w3.org>
Message-ID: <20120119074413309666.b538b3ed@xn--mlform-iua.no>
Eric Muller, Wed, 18 Jan 2012 15:19:31 -0800:
> On 1/18/2012 1:27 PM, Leif Halvard Silli wrote:

>> My claim, however, is that that is not enough: The <rt> between each 
>> <rb> splits the compound up, preventing e.g. spellcheckers from 
>> recognizing the compound as a compound. 
> 
> Clearly, spell-checking, just like any other text processing of an 
> HTML document, needs to interpret the markup. Ideally, the 
> spell-checking of "<i>un</i>fortunate" should not complain that "un" 
> is not a word. I don't see that each <rt> splits the compound any 
> more that the <i> splits "unfortunate".

Though I understand that a spell checker tool could be made to handle 
the fact that an <rt/> is inserted into the word, these two things that 
you mention seem rather different: In the <rt/> case, it is an element 
in the middle of the word, whereas in the '<i>un</i>fortunate' case, 
the correct thing would be to ignore that there is a <i>. A screen 
reader like NVDA has no problems recognizing "<i>un</i>fortunate" as a 
single word.

>> You see, the underlying claim of my question is that it
>> doesn't matter whether you write
>> 
>> 1.<ruby><rb>東</rb><rt>とう</rt><rb>京</rb><rt>きょう</rt></ruby>,
>> 2.<ruby>東<rt>とう</rt>京<rt>きょう</rt></ruby>  or
>> 3.<ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb>京</rb><rt>きょう
>> </rt></ruby>
> 
> I understand that you don't want to treat the three forms as distinct.
> 
> I am pointing out that it seems natural to assign different 
> interpretations to 1&2 on the one hand and 3 on the other, mostly for 
> the purpose of constraining the possible renderings (jukugo on one 
> hand, just adjacent "ordinary" ruby on the other).

So then you have the rendering of the ruby text in mind. I understand 
that argument. My focus is perception of the ruby base/bases.

> Without a difference of interpretation, then one needs some other way 
> of indicating that the two ruby in 3. form a group for jukugo 
> treatment.  I was under the impression it would be easier to assign 
> meaning to constructs made of the existing HTML elements, rather than 
> request the addition of new elements.

I do try to work with the 'existing HTML' as well: Trying to consider 
how screen readers and spell checkers actually interpret mark-up is a 
perspective that is focused on 'existing HTML'. I am not aware of any 
other structure in HTML where an element - without the use of CSS - can 
suddenly pop up inside the middle of the word. However, I admit that it 
might be that I, as hinted by Martin, have calculated the problem with 
that wrongly.

> Of course, one could always invent a CSS property that is the 
> functional equivalent of a group id attribute, which in turn is the 
> functional equivalent of a container element. Besides being 
> intuitively awkward, it would also be painful in practice: some 
> publishing workflows involve the mechanical injection of ruby and are 
> much easier to implement if that does not involve CSS processing.
> 
> As for the underline, I did not invent it. The current draft makes 
> <u>a</u><u>bc</c> render differently than <u>abc</u> when 
> text-decoration='edges'.

Do you have a link?

> Finally, it has long been the case that 
> "<span>a</span><span>bc</span> can render differently than 
> <span>abc</span>, e.g. when there is a visible border. If it there is 
> a difference for <u> and <span>, I don't quite understand the value 
> of insisting on no difference for <ruby>.

Did you by 'no difference for <ruby>' also mean 'no difference for 
<rb>'?

Some screenreaders, at least VoiceOver, do get problems if you 'split' 
up a word using inline elements, causing that the word is read as 
several words instead of as a single word. Others, like NVDA, do not 
have that problem. I think that the logical thing would if it does not 
create problems. That said: The CSS display property impacts on what is 
perceived as a word as well. For instance, for this example:

   <un style='display:inline-block'>un</u>fortunate

then even NVDA will read it as two words. NVDA will, btw, not let 
elements with display:none affect the reading. So this:

   un<un style='display:none'>super</u>fortunate

would be read as 'unfortunate' by NVDA.
-- 
Leif Halvard Silli
Received on Thursday, 19 January 2012 06:44:49 UTC