- From: Masafumi NAKANE/中根雅文 <max@wide.ad.jp>
- Date: Fri, 07 Nov 1997 23:09:35 +0900
- To: w3c-wai-ig@w3.org
- Cc: max@wide.ad.jp
- Message-Id: <199711071409.XAA29752@access.sfc.wide.ad.jp>
Hi, I've written a short proposal on mechanism to present phonetic information. This has been concern of Japanese blind computer users' community. I hope this to be taken into consideration as we discuss phonetic markup on the list and/or during upcoming meeting(s). I appreciate any comments, although I won't be able to respond to them too promptly as I'm leaving for Austin in about 12 hours. Thanks, Max ----------------------------------------------------------------------- Masafumi NAKANE, Keio Univ., Dept. of Environmental Information E-Mail : max@wide.ad.jp / max@FreeBSD.ORG [URL] : http://www.sfc.wide.ad.jp/~max/
ABSTRACT: This memo describes the necessity of phonetic markup in HTML from standpoint of accessibility. REQUIREMENT: There should be some mean for web page authors to convey pronunciation of words to so called self-voicing web browsers and to users of visual user agents. BACKGROUND: With some languages, there are some occasions where it is impossible to determine the pronunciation of words/phrases. Many of these cases can be solved with proper context analysis while rest of the cases allows no one but the author to determine the pronunciation. Furthermore, the former may be solved with technical improvement in near future, it is difficult to have good context analysis at present. The latter probably will never be solved. These facts raise necessity of mechanism for web page authors to convey the phonetic information to the users. JAPANESE SPECIFIC POINTS: With printed Japanese, the language is represented using mixture of ideographic characters called kanji and phonographic characters called kana. In Japanese braille, only kana is used. Thus, braille translation process requires conversion of kanji into phonetic form. This is also true for the speech output. Most of the kanji text can be translated into kana without much difficulty if the user agent, or maybe access agent have good dictionary. However, there are many cases where it is impossible for readers to determine the pronunciation of certain combination of kanji characters. There are even characters of which the pronunciation cannot be determined from how and/or where they are used. This is common case with proper nouns. In order to convey correct phonetic representation of kanji, mechanism to convey that information is mandatory. PROPOSAL: An attribute for this purpose should be added to the SPAN element. (I expect an appropriate attribute name to be chosen as we discuss. In this document, however, I use PHONETIC for the convenience.) This attribute takes a character string as its value. The string should describe the pronunciation of the word inside the SPAN element. EXAMPLES: I <SPAN phonetic="red">read</SPAN> the book. --- [1] <SPAN lang="ja" phonetic="higashi">HIGASHI</SPAN> --- [2a] <SPAN lang="ja" phonetic="azuma">HIGASHI</SPAN> --- [2b] The example [1] is obvious. This can adjust the way self-voicing UA reads the text. The examples [2a] and [2b] need some explanation. Assume that ``HIGASHI'' is one kanji character. This character has several different pronunciation and ``azuma'' and ``higashi'' are two of them. If this character is used as people's last name, it is impossible for anyone but the person who owns the name to determine how this character should be read. In these examples, assume lowercase characters represent kana and uppercase characters represent kanji. IMPLEMENTATION CONSIDERATION: Cases like the example [1] is simple. Self-voicing UA should just use the value of the attribute to adjust the speech output. Consideration must be taken when processing languages like Japanese. UA with voice and/or braille output can use this information. However, it is questionable how this should be treated in visual browsers. In printed Japanese material, kana to represent the pronunciation of kanji character(s) is put beside the kanji in smaller font when the pronunciation needs to be made clear. The simplest implementation would be unconditionally present the value of the PHONETIC attribute using font in appropriate size if the LANG is ``ja'' (or other lanugages that have the same convention). However, this can lead misuse/abuse of the attribute. In Japanese print, there is something called rubi whose original purpose was to put phonetic representation of kanji. Rubi is usually written beside the corresponding kanji in smaller font. In spite of the original purpose, kanji is put in the place of rubi using the font whose size is the same as one used for rubi. This is common practice in Japanese literature. With this fact in mind, it is easy to imagine that there would be more than just a few people who would use the PHONETIC attribute for presentation purpose if the UA shows the content of the attribute like for rubi. PROBLEM: How is it possible to limit the characters to be used for this attribute to the phonographic characters of the language inside the element? What method, character set, etc. should be used to represent the pronunciation? Kana probably is the best for Japanese, but what about other languages? Is adding this attribute to the SPAN element enough? Don't any other inline elements need this? Is this the best way anyway? RELATED DOCUMENTATION: Following Internet draft discusses the similar issue: <draft-duerst-ruby-01.txt> University of Zurich Martin J. Duerst Ruby in the Hypertext Markup Language (ftp://ftp.ds.internic.net/internet-drafts/draft-duerst-ruby-01.txt)
Received on Friday, 7 November 1997 09:10:07 UTC