- From: Kurt Fuqua <kfuqua@vailsys.com>
- Date: Mon, 20 Nov 2006 16:42:05 -0600
- To: "VoiceBrowser" <www-voice@w3.org>
- Message-ID: <001f01c70cf5$1ad92df0$8981a8c0@vail>
Sirs, I offer these comments on the working draft of the Pronunciation Lexicon Specification. Vail Systems uses VoiceXML widely, has a certified voice 2.0 browser, and hopes to employ the PLS. Progress has been made on the specification but in my opinion it is not as sophisticated as existing lexicon standards and does not solve some basic issues for the normalization of pronunciations. I have included my comments below. Kurt Fuqua Director of Speech Sciences Vail Systems, Inc Comments on the Pronunciation Lexicon Specification Oct 26th Working Draft First, I appreciate the effort put into creating this draft. It will be valuable to have a clear W3C standard for lexicons. I have several concerns with the current draft specification. In summary, the concerns are: * Phonetic vs Phonemic Representation * Lexicon Integrity Checks * IPA Ambiguities * Using IPA for Representation of Phonemes * Parts of Speech I briefly address each of these below. Phonetic vs Phonemic Representation The document does not explicitly state whether the pronunciations in the lexicon are to be phonemic or phonetic. The difference is significant. The tag names (<phoneme>) and the examples in the document imply that the representation is phonemic. In an example (4.6), the word “huge” is transcribed using the phonemic transcription “hjuːʤ” rather than the phonetic transcription “çjuːʤ”. I believe that all the pronunciations should be given phonemically rather than phonetically. First, a phonetic transcription is generally beyond the capability of those not trained in linguistics. Even a trained linguist would have a difficult time creating consistent phonetic transcriptions. There is a second and far more compelling theoretical reason why it has been standard practice for lexicons to be transcribed phonemically. Phonology rules need to be applied to a phonemic transcription in order to render a phonetic transcription for a sentence to be synthesized or spoken. This requires the underlying phoneme representation. If the phoneme representation were not given, one would first have to work backwards to determine the phonemic transcription before the phonology rules could be applied. Several of the phonology rules apply across word boundaries. Therefore a phonetic transcription of individual words is counter-productive. Thus, I recommend that the specification be explicit that the transcriptions are to be phonemic, not phonetic, and that this be required. Lexicon Integrity Checks Lexicons are notorious for containing inconsistent information. It is therefore very useful to include integrity checks within the lexicon. The integrity checks allow for automated consistency checking. For more than a decade, the lexicons of the Scalable Language API have used a phonemic key. The lexicon contains a phonemic key for each language of the lexicon. The phonemic key is simply a list of all the phonemes for that language. If any pronunciation contains a phoneme which is not in that phoneme set, there is a consistency error. The concept is very simple, and it catches many errors immediately after an edit. Analogously there is also a grapheme key; this contains every grapheme used by that language. There are several other integrity checks possible and SLAPI implements most of them. For the sake of brevity, I will emphasize only these two keys. I recommend that both a phonemic key and a graphemic key be included for each language in the lexicon, and that these keys be required. IPA Ambiguities The IPA is a well developed and useful representation. However it does contain some significant ambiguities. I believe that the standard should recommend certain normalized forms. Several consonant phonemes can be represented using alternate symbols under the official guidelines. This ambiguity means that comparing IPA symbols becomes quite complex. For example, the very common IPA symbol #110 is represented as /ɡ/ (x0261) and is logically equivalent to #210 /g/ (x0067). There many such ambiguous IPA symbols. There are also several obsolete IPA symbols which are still frequently used (e.g. #214). (This obsolete symbol is even included as an example in the draft.) I recommend that a recommendation be made for the normalization of these ambiguous and obsolete IPA symbols within the PLS. IPA for Representation of Phonemes As its name implies, the IPA was created primarily as a phonetic alphabet, not a phonemic alphabet. It can be used for the representation of phonemes but unfortunately linguists often transcribe the identical language with slightly different IPA symbols and diacritics. There should be recommendations to normalize the transcription of phonemes using IPA symbols. The central problem is that a phoneme is a set of allophones. IPA can transcribe allophones in a way that is generally unambiguous. The difficulty is selecting which symbol to use to represent the set. For example, some linguists would transcribe the English word ‘heat’ as /hiːt/ others would use /hit/. In English, vowel length is not contrastive, although the vowel /i/ is always long. The question is whether to include a diacritic, or suprasegmental such as length, if that feature is not contrastive in the language. This issue was resolved under SLAPI with the following three normalization guidelines: 1) No modifier should be used in a phoneme symbol which does not constitute a phonemic contrast for the phoneme in that language. 2) When a phoneme is phonetically rendered in allophones involving different base symbols, the symbol chosen to represent the phoneme should be the one from which the others are arguably derived by phonological rule. 3) The phoneme symbol must be unique within the particular language. While these do not resolve every issue of phonemic representation, they do resolve most such issues and allow for a standard normalization. I recommend that the same phoneme normalization guidelines be adopted. Part of Speech I understand that a decision was made to not include part of speech information. I think that the lack of this most basic form of grammatical information will fundamentally handicap the standard. Part of speech information is used to differentiate pronunciations. English and other languages have many pairs of words that are pronounced differently. The draft includes such an example (4.9.3 example 2). The lexicon is the source of all word-level information for the applications. Without part of speech information in the lexicon, there is simply no way to differentiate which pronunciation to use. I do recognize that this introduces another level of complexity in that the specification must include the possible parts of speech. However this has been addressed in other existing standards such as LexiconXML, the Scalable Language API and OSIS. I recommend that part of speech information be included as a tag.
Received on Wednesday, 22 November 2006 00:09:19 UTC