- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 18 Jan 2012 22:27:42 +0100
- To: Eric Muller <emuller@adobe.com>
- Cc: public-i18n-cjk@w3.org
Eric Muller, Wed, 18 Jan 2012 10:54:00 -0800: > On 1/12/2012 5:03 PM, Leif Halvard Silli wrote: >>> Both >>> >>> <ruby><rb>東</rb><rt>とう</rt><rb> 京< /rb><rt>きょう</rt></ruby> (may >>> be with a different interleaving of rbs and rts) >>> >>> and >>> >>> <ruby>東<rt>とう</rt>京<rt>きょう</rt>< /ruby> >>> >>> capture the list of pairs {東, とう}, {京, きょう} equally well. >> >> Why is *any* of the two examples above any better than this: >> >> <ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb> 京< /rb><rt>きょう >> </rt></ruby> ... snip ... > Given the same characters, and in fact the same pairs, the decision > to treat those pairs as jukugo or not is based on the semantics of > the text. It seemed obvious to me that using a single <ruby> vs > multiple <ruby> was the only way to go, but you are right that I did > not made that clear. So, you say that a compound jukugo needs to be kept together as a compound. And that that compound wrapper is <ruby>. My claim, however, is that that is not enough: The <rt> between each <rb> splits the compound up, preventing e.g. spellcheckers from recognizing the compound as a compound. > Note that this is not entirely different from the underline problem > which was discussed on www-style not too long ago: <u>A</u><u>BC</u> > is considered distinct from <u>ABC</u> (and a fortiori from three > successive <u>), especially in CJK world. The underline is used on > names, and reflecting the parts of the name, as in (A)(BC) is deemed > important. Like <u> is not intended for alphabetic usage only, <ruby> is not intended for Japanese only. W.r.t. to <u>, then - for e.g. Latin text - what's the difference between <u>f</u><u>oo</u> and <u>foo</u>? Would the spell checker not recognize both of them as one and the same word? OTOH, if one did this: <u lang=en >f</u><u lang=ru >oo</u>, then a sensitive spellchecker would not see it as the word 'foo' but as two words: The English word 'f' and the Russian word 'oo'... And, this allows me to jump back to the initial question: Provided I understood you correctly, then - as already stated above - I am not convinced by your argument. You see, the underlying claim of my question is that it doesn't matter whether you write 1. <ruby><rb>東</rb><rt>とう</rt><rb>京</rb><rt>きょう</rt></ruby>, 2. <ruby>東<rt>とう</rt>京<rt>きょう</rt></ruby> or 3. <ruby><rb>東</rb><rt>とう</rt></ruby><ruby><rb>京</rb><rt>きょう </rt></ruby> I believe, that for the spellchecker to perceive *words*, including compound 'jukugos', then the word must EITHER be kept inside a single <rb> or inside two or more adjacent <rb>s. If there needs to be a wrapper, then it could be <ruby> or <rbc>. The problem, however, is that HTML5 allows us to do: <ruby> <rb>f</rb><rt>A</rt> <rb>o</rb><rt>fake</rt> <rb>o</rb><rt>word</rt> </ruby> But this permission does not seem meaningful: The word 'foo' in the above example, would not be possible for a spell checker to detect. The meaningful answer, would be to, first and foremost *forbid* the above construct. Meaning that <ruby> would only permit a single pair of <rb><rt>: <ruby><rb><rt></ruby>. But two or more pairs [such as this: <ruby><rb><rt><rb><rt></ruby>] would be forbidden. Secondly, one would need to EITHER take back <rbc> OR to permit two or more adjacent <rb> as direct child of <ruby>. Thus, either this: <ruby> <rbc> <rb>f</rb><rb>o</rb><rb>o</rb> </rbc> <rt>A</rt><rt>fake</rt><rt>word</rt> </ruby> And/Or this: <ruby> <rb>f</rb><rb>o</rb><rb>o</rb> <rt>A</rt><rt>fake</rt><rt>word</rt> </ruby> It also seems to me that <rt> by definition should be seen as a word separator: Each <rt> includes one or more words. Whereas <rb> is not seen as a word separator. Comments? -- Leif Halvard Silli
Received on Wednesday, 18 January 2012 21:28:32 UTC