- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 18 Jan 2012 22:27:42 +0100
- To: Eric Muller <emuller@adobe.com>
- Cc: public-i18n-cjk@w3.org
Eric Muller, Wed, 18 Jan 2012 10:54:00 -0800:
> On 1/12/2012 5:03 PM, Leif Halvard Silli wrote:
>>> Both
>>>
>>> <ruby><rb>東</rb><rt>とう</rt><rb> 京< /rb><rt>きょう</rt></ruby> (may
>>> be with a different interleaving of rbs and rts)
>>>
>>> and
>>>
>>> <ruby>東<rt>とう</rt>京<rt>きょう</rt>< /ruby>
>>>
>>> capture the list of pairs {東, とう}, {京, きょう} equally well.
>>
>> Why is *any* of the two examples above any better than this:
>>
>> <ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb> 京< /rb><rt>きょう
>> </rt></ruby>
... snip ...
> Given the same characters, and in fact the same pairs, the decision
> to treat those pairs as jukugo or not is based on the semantics of
> the text. It seemed obvious to me that using a single <ruby> vs
> multiple <ruby> was the only way to go, but you are right that I did
> not made that clear.
So, you say that a compound jukugo needs to be kept together as a
compound. And that that compound wrapper is <ruby>. My claim, however,
is that that is not enough: The <rt> between each <rb> splits the
compound up, preventing e.g. spellcheckers from recognizing the
compound as a compound.
> Note that this is not entirely different from the underline problem
> which was discussed on www-style not too long ago: <u>A</u><u>BC</u>
> is considered distinct from <u>ABC</u> (and a fortiori from three
> successive <u>), especially in CJK world. The underline is used on
> names, and reflecting the parts of the name, as in (A)(BC) is deemed
> important.
Like <u> is not intended for alphabetic usage only, <ruby> is not
intended for Japanese only. W.r.t. to <u>, then - for e.g. Latin text
- what's the difference between <u>f</u><u>oo</u> and <u>foo</u>? Would
the spell checker not recognize both of them as one and the same word?
OTOH, if one did this: <u lang=en >f</u><u lang=ru >oo</u>, then a
sensitive spellchecker would not see it as the word 'foo' but as two
words: The English word 'f' and the Russian word 'oo'... And, this
allows me to jump back to the initial question: Provided I understood
you correctly, then - as already stated above - I am not convinced by
your argument. You see, the underlying claim of my question is that it
doesn't matter whether you write
1. <ruby><rb>東</rb><rt>とう</rt><rb>京</rb><rt>きょう</rt></ruby>,
2. <ruby>東<rt>とう</rt>京<rt>きょう</rt></ruby> or
3. <ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb>京</rb><rt>きょう
</rt></ruby>
I believe, that for the spellchecker to perceive *words*, including
compound 'jukugos', then the word must EITHER be kept inside a single
<rb> or inside two or more adjacent <rb>s. If there needs to be a
wrapper, then it could be <ruby> or <rbc>.
The problem, however, is that HTML5 allows us to do:
<ruby>
<rb>f</rb><rt>A</rt>
<rb>o</rb><rt>fake</rt>
<rb>o</rb><rt>word</rt>
</ruby>
But this permission does not seem meaningful: The word 'foo' in the
above example, would not be possible for a spell checker to detect.
The meaningful answer, would be to, first and foremost *forbid* the
above construct. Meaning that <ruby> would only permit a single pair of
<rb><rt>: <ruby><rb><rt></ruby>. But two or more pairs [such as this:
<ruby><rb><rt><rb><rt></ruby>] would be forbidden. Secondly, one would
need to EITHER take back <rbc> OR to permit two or more adjacent <rb>
as direct child of <ruby>. Thus, either this:
<ruby>
<rbc>
<rb>f</rb><rb>o</rb><rb>o</rb>
</rbc>
<rt>A</rt><rt>fake</rt><rt>word</rt>
</ruby>
And/Or this:
<ruby>
<rb>f</rb><rb>o</rb><rb>o</rb>
<rt>A</rt><rt>fake</rt><rt>word</rt>
</ruby>
It also seems to me that <rt> by definition should be seen as a word
separator: Each <rt> includes one or more words. Whereas <rb> is not
seen as a word separator.
Comments?
--
Leif Halvard Silli
Received on Wednesday, 18 January 2012 21:28:32 UTC