- From: Al Gilman <asgilman@iamdigex.net>
- Date: Wed, 28 Aug 2002 08:39:18 -0400
- To: <w3c-wai-gl@w3.org>
- Cc: er@uts.cc.utexas.edu
[I have dropped PF from the thread. We can point PF and AU at the GL archives. - Al] At 01:50 AM 2002-08-28, Lisa Seeman wrote: >An interesting perspective from Esther Raizen, controversial but interesting [a - romanize, write the alternate text in romanization; b - don't try to get the right vowels for classical spelling, just capture something that is phonetically sufficient. Don't try to distinguish homophone variants.] Well, she is right that if the purpose is to drive text-to-speech, that we don't have to distinguish between cases that are pronounced alike. If we combine the two things she suggested, that approach is already IMHO in the concept set in the form of providing phonetic equivalents in the 'sampa' roman-alphabet variant on IPA. This is just a standards proposal for thw world-wide TTS industry whether to parse sampa or IPA text for phonetics. And we may not get them to agree. Since they do separate pronunciation libraries for each natural language, including both IPA and sampa as just two more language models is not necessarily a big deal. The models would be very simple because these 'scripts' as the internationalization people (UniCode) call them are phonetic to begin with. The library structure for distributing the variant models is there already in the software distribution pipeline. And it is not hard to get the pronunciation from an author, even if "vocalization" in the sense of inserting the literarily correct vowel characters is a scarce skill. The basic tool here is the alignment tool that aligns a voice recording with its transcript. Get the author to type the text in standard, consonants-only spelling. Then get them to read it to the recorder. Then the alignment tool matches up the sound record with the text record. This gives you not only for voice browsing the sound samples to go with prompts for the dialog, but also a validated sound/text relationship and it is easy at this point to extract an IPA or sampa transcription of how each word was produced. The validation check is that the system reads the text back from the by-exception phoneticized version and the author edits by hand the words that get spoken wrong in the machine reconstruction of the utterance. To get only the ambiguous words marked up with phonetics, it is enough to take the words in the text, compare them with the dictionary to see which are ambiguous as to their pronunciation, and mark up the ambiguous ones with their phonetics extracted from the speech sample. There is a second approach for teaching the language and other applications where correct vocalized spelling is required. This is to construct a sentence-level semantic model of the utterance so as to have contextual cues aiding the discrimination of when a given consonant-spelled-word is a noun vs. when it is a verb, and when it is which sense as broken out in the wordNet sense nodes. If we can get it bound to a unique wordNet node I expect that the correct vowel-ization will be a matter of record, will be unique for that node, and we can do it all with reference data in the dictionary. This runs like the above tool except that the semantic-model-building is in the form of a binding to wordNet rather than to the dictionary, with an added layer of sentence structure as an aid to reduce the number of ambiguities that have to be taken back to the author for interactive resolution. We are building an interpretation of the consonants-only text as sentence structures populated with wordNet nodes. The interactive authoring techniques stuff is how to quickly present to the author on the fly what their choices are as to wordNet nodes. In this case it is cheap to present vowelized spellings, but as Professor Raizen says, the author probably doesn't know from vowel spelling anyway. This is where the picture thesaurus could be used. Presenting the likeliest options as to wordNet bindings in a little fly-out display attached to the ambiuous word in the text with a thumbnail image and a brief phrasal identification of the variant in some sort of form where it was easy for the author to indicate which choice they meant, to ask for more options or more information on the options, this is the sort of interaction that would get the by-exception vocalized text captured in less than ten times the time it takes to type it. Al At 01:50 AM 2002-08-28, Lisa Seeman wrote: >An interesting perspective from Esther Raizen, controversial but interesting > >Her second idea, could be modified to just not requiring full vowels but >just enough to differentiate separate meaning is supported by the current >wording of the guideline >yes the academics will be some what upset > >If we do consider modifying the ancient and historical Hebrew vowel system >then may I recommend that we add a header setting which allows most >renderings to not show the vowels encoded. Otherwise we may find ourselves >in an area of religious sensitivities. >Such a heading setting (through PF) would also make people more comfortable >about making mistakes in their vowels. > >Forwarded with permission > > >All the best, > >Lisa Seeman > >UnBounded Access > >Widen the World Web > >http://www.UBaccess.com > > > >-----Original Message----- >From: esther raizen [mailto:er@uts.cc.utexas.edu] >Sent: Tuesday, August 27, 2002 6:05 AM >To: Lisa Seeman >Subject: RE: Hebrew screen readers > > >Lisa, >This is what I expected, more or less. Development for commercial >purposes will always require a high return on the investment, which >in itself is very high. An affordable solution will probably have to >come from academia, and even there you will rarely find the >willingness to absorb the cost and create a tool that is available to >the public at no cost or at a nominal cost. I hope that the Technion >takes it on in some way. >As for vowels: There are two reasons why vocalization is difficult. >One is technical, that is, how to physically insert vowels into a >text (for which, as you write, a number of very reasonable solutions >exist; I have been doing it for years, and although I still view it >as a daunting task, it is not impossible at all). The second, which >is more significant, is the simple fact that very few people know how >to vocalize Hebrew. This is normally done by a professional, which >again involves high costs. People will not vocalize their texts >because they will not know how to do that, and even if they can do it >technically, most people will not want to appear ignorant by entering >sequences of wrong vowels. This is enough to kill the whole >endeavor, I am afraid. >Unless we advocate one of two things: >Writing Hebrew in Latin characters (people will jump through the >roof, but this is not such a terrible idea if one uses it for special >purposes). If, as I wrote to you in an earlier message, this is >somehow done as hidden text behind the Hebrew presentation, and the >screen reader reads the hidden rather than the visible text, the only >work involved on the part of the author would be to write the passage >twice, once in Hebrew and once in Latin characters. This is much >easier than vocalizing. >Deciding on a reduced, and, therefore, non-grammatical, vocalization >system, where you have one symbol for each vowel irrespective of its >historical origin. Thus, the noun ganav, thief, written >gimel-patach-nun-dagesh-kamatz-bet, and the verb ganav, he stole, >written gimel-kamatz-nun-patach-bet will be both written >gimel-a-nun-a-bet. Again, people will jump through the roof, but >that is the reality of all script reforms, and as long as one views >it as a solution to a specific problem which does not have to apply >to the language in general there is a chance that with good reasoning >it will be accepted. Accessibility is the best reasoning I can think >of. >Feel free to forward this message to people-- I will be glad to >communicate with others on the matter. >Best, >Esther >-- >Esther Raizen, Ph.D. >Assistant Professor >Coordinator of Hebrew language program, Dept. of Middle Eastern >Languages and Cultures >Graduate Advisor, Center for Middle Eastern Studies >The University of Texas at Austin >Austin, TX 78712 >Phone: 512-475-6654 >e-mail: er@uts.cc.utexas.edu >web page: http://www.lamc.utexas.edu/hebrew
Received on Wednesday, 28 August 2002 08:39:27 UTC