RE: diacritic marks from Charles McCathieNevile on 2004-02-06 (w3c-wai-gl@w3.org from January to March 2004)

From: Charles McCathieNevile <charles@w3.org>
Date: Fri, 6 Feb 2004 05:30:04 -0500 (EST)
To: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
Cc: W3C Accessibility Guidelines <w3c-wai-gl@w3.org>
Message-ID: <Pine.LNX.4.55.0402060511410.20747@homer.w3.org>

On Fri, 6 Feb 2004, Reuven Nisser wrote:

>Hello Charles
>> For a large number of common kanji, like in Hebrew, there is enough to build
>> effective lookup tables (the glossary approach) so they can be pronounced
>> correctly by a text to speech engine. But for a large number of other common
>> kanji, and more particularly for less common ones, this isn't feasible.
>> Something that makes it possible to provide clear interpretation is therefore
>> important. One technique is to use clear characters, as used to write simple
>> documents pitched at a broad audience. It is preferable in a way that they
>> need not be always visible - think of the differences between closed and open
>> captioning on television.
>
>One question though regarding the attached paragraph. When you look at a
>single Japanese word is there always a one and only one way to pronounce it?

No, although in practice there are a restricted number of possiblities. One
difficulty is that japanese doesn't use word breaks, so you don't get to look
at a single word very often.

>If so, then the problem in Japanese is "only" a lookup table for each word
>and the phonetic representation.

True, but the lookup table for "kiddy japanese" is extremely large. If
Moore's law continues to apply we could expect it to work in computers at
some point, but like doing grammar analysis by brute force, it isn't
currently feasible as I understand it.

It would be nice to have some people more versed in japanese than I am in
this discussion. But the culture of this list can be intimidatory by american
standards. From other perspectives it is considered bullying and abusive.
This explains some of the reluctance of people to participate even when they
have important contributions to make. There are other things like limited
time, a generic fear of being ridiculed or ignored for incorrect language
which is exacerbated by the fact that it happens uncheced here, or not
actually being able to keep up with the pace of the discussion, which have an
impact too.

>In Hebrew and in Arabic, if you look at a single word, there are in average
>2.3 ways to pronounce it. There are even words with 13 ways to pronounce.
>Each pronunciation has a different meaning. To eliminate several
>possibilities you need to analyze the sentence grammatically. To eliminate
>more you need to get to text semantics.
>For example, S-F-R could be SEFER (book) or SAPAR (barber).
>Regards,
>Reuven Nisser
>Ofek Liyladenu

Cheers

Chaals

Received on Friday, 6 February 2004 05:30:19 UTC