- From: Richard Ishida <ishida@w3.org>
- Date: Mon, 22 Dec 2003 11:50:38 -0000
- To: "'lisa seeman'" <seeman@netvision.net.il>, "'WAI-GL'" <w3c-wai-gl@w3.org>
- Cc: Martin J. Durst <duerst@w3.org>, "Richard Ishida" <ishida@w3.org>
Hi Lisa, > From: w3c-wai-gl-request@w3.org > [mailto:w3c-wai-gl-request@w3.org] On Behalf Of lisa seeman > Sent: 22 December 2003 05:55 <snip> > passages or fragments of text occurring within the content > that are written in a language other than the primary natural > language of the content as a whole, are identifiable, either > through the character encoding used or through direct > including specification of the language of the passage or > fragment. [X] Character encoding information helps you know the script, which may be useful for font selection or some other rendering considerations, but doesn't help you with selecting the right voice for pronunciation of the text. For example, ASCII text could just as easily be Indonesian or Malaysian as English. Text using 'Latin1' characters could represent a very wide range of languages. So 'either through the character encoding used' would be inappropriate, unfortunately. To help me better understand the issue, could you briefly characterise for me the type of content that causes the problem? Is it English? How much of it is there (as a very rough average)? Is much of it acronyms? proper names? technical words? etc. Exploring solutions: can one assume that Israeli text to speech systems can deal pretty well with the embedded non-Hebrew stuff? Does that apply to the tts systems dealing with other languages? If Hebrew systems deal with English ok, maybe you'd only have to label stuff that was, say, Indonesian or Malay?? RI
Received on Monday, 22 December 2003 06:51:13 UTC