- From: Charles McCathieNevile <charles@w3.org>
- Date: Thu, 5 Feb 2004 19:37:32 -0500 (EST)
- To: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
- Cc: W3C Accessibility Guidelines <w3c-wai-gl@w3.org>
Perhaps Joe knows all about japanese, too, and will tell us. In case he's busy, or not omniscient, I can pass on what was explained by several people at the face to face meeting that took place in Japan last year - essentially, that there are similar problems. You could always look over the minutes to find out who was there and get a summary version of what they said. Additional detail may have come from Keio-based members of the W3C Team, and some is from from Masafumi Nakane, a friend and former colleague (he and I are part of the very small group of "former WAI staff") in japan who is blind. to go into a little more detail: Japanese has 3 alphabets. Two are called "kana" - roughly, characters. They look different, sound the same, and have a fixed number of letters with strict phonetic rules. One, hiragana, is used for japanese words in "kiddie script", for little grammatical things like prepositions, and wherever there isn't a kanji, or you decided not to use a kanji. The other, katakana, is used to write words that are imported into the language. Like Lisa was talking about before, for hebrew, but perhaps more reliably, it is easy to identify in japanese the words that are imported, because they spend a long time being written in katakana... But the "grown-ups" use kanji - characters that can be ambiguous in both sound and meaning, and whose pronunciation is more or less impossible to determine from their form. A university graduate is expected to know some thousands of these. In adult books there is a mix of perhaps 50-70% kanji and the rest kana - less kanji for kids books. Braille, in japan, is written using an alphabet that corresponds to hiragana. So creating any braille content actually requires being able to address the issue of getting a "kiddie version" of it. It's possible to create new kanji in Japan, like it is possible to create new words in english. Except that there is no sure guide to pronunciation of them from the component parts. (A former colleague at W3C has a unique kanji in his name, created in this way). Given all of this, the W3C, primarily at the instigation of East Asian members (chinese has a some similar characteristics, with apparently important differences) developed the ruby specification, a module for HTML whose specific purpose was to allow explanatory content to be placed alongside "primary content". Common use cases include business cards, newspaper articles, and textbooks. For a large number of common kanji, like in hebrew, there is enough to build effective lookup tables (the glossary approach) so they can be pronounced correctly by a text to speech engine. But for a large number of other common kanji, and more particularly for less common ones, this isn't feasible. Something that makes it possible to provide clear interpretation is therefore important. One technique is to use clear characters, as used to write simple documents pitched at a broad audience. It is preferable in a way that they need not be always visible - think of the differences between closed and open captioning on television. I do not want to make any claims now about a priority of such a requirement - I think that it is premature to assess priority without having looked at the techniques available, the people who benefit, the type of benefit, the alternatives, and then worked out what a rational scheme might be based on. In the continuum between "authors should provide pre-recorded versions of their documents, complete with powerful interactive VoiceXML navigation, in several accents" and "wait for speech technology to be perfect", or between "authors should provide captions, sign language interepretation, subtitles and credits" and "make better speech recognition stuff so the user can watch their own techology generate the version they want", I think we need to look for some pragmatic solutions. I am not a great fan of many technoogies around now - I think they could easily be improved a lot. Others, despite being really hard to work with, are impressive because of the technical complexity of what they do, or because of the ingenuity required to develop some kind of workable system at all. But I believe, as I did in 2000, that finding some base line that can be updated, and that takes into account what is available in the real world, is important. Without some kind of agreed baseline, it seems to me premature to rule out techniques for solving existing problems, whether those problems are caused by the fact that people don't know what technology is available, can't afford it, can't be bothered installing it, or cannot use it except if they have 9 different hardware and software set-ups to read a common website. Let alone where the technology (speech recognition good enough to take to the movies and have on-the-fly captions generated) does not exist yet. Naturally, I have some ideas about this - I tend to favour (as Joe seems to) standards-conformant implementations before looking at ways to cope with obsoleted technology which does not conform to existing standards where other systems do. I tend to favour forward-looking solutions over ones that break future compatibility for the sake of backwards compatibility. But I don't have firm rules on this yet, and I note that there is not consensus 3 years after I last raised it. It isn't an easy topic. By comparison, adding important accents might be fairly straightforward. It's clearly the practice in arabic that some of these marks are added (it has essentially the same approach as hebrew). cheers Chaals On Fri, 6 Feb 2004, Reuven Nisser wrote: > >Hello Joe, > >>> Nikkud. I know all about them and they are rarely used in adult Hebrew. >>> Your proposal to force authors everywhere to use kiddie Hebrew >>> ain't gonna >>> cut it, Reuven. > >What I am saying is that Nikud is not childish. You are right, people will >not read a book with all Nikud, but they will accept a book with small >amount of Nikud which will help them read the word correctly especially with >strange names. > >>> Fix your adaptive technology. Don't try to tell people how to write. >The reason that you people have so good text to speech synthesizer in >English is because there is a large market for it in telephony, games and so >on. Text to phonetic in English is quite simple, but nobody thought about >blind people when created very good voices for English phonetic to speech. >Money makes the world go round. > >Israel is small, Hebrew language has a limitation both in text to phonetic >and the regular money solving problems with the phonetic to speech. So, >nobody invests in text to speech in Hebrew. This is the best or almost the >best that can be done with the current efforts. We will be able to work much >harder and get a percent more and so on but we will never reach the 100%. > >To the best of my knowledge, the problems in Arabic are the same. Maybe >someone here on the list which worked with Arabic text to speech can speak >about his experience.
Received on Thursday, 5 February 2004 19:37:33 UTC