- From: Kurt Fuqua <kfuqua@vailsys.com>
- Date: Tue, 27 Feb 2007 17:30:04 -0600
- To: "VoiceBrowser" <www-voice@w3.org>
- Cc: "Baggia Paolo" <paolo.baggia@loquendo.com>
- Message-ID: <001201c75ac7$359f1bc0$a0dd5340$@com>
Thank you for the opportunity to reply to your reaction to my comments. It is my sincere hope that the PLS can be made to be excellent. Kurt Fuqua Director of Speech Sciences, Vail Systems, Inc. Issue R106-5 I recommend that part of speech information be included as a tag. I am pleased to hear that you partially accept the proposal to include part of speech information in entries. You have referred me to the 'roles' of Section 4.4. It is important that the parts of speech (POS) be standardized so that lexicons can be merged or distributed. This is not a trivial task but it has been done well in other standards. I would encourage the use of the parts of speech which are used under the SLAPI standard. These parts of speech are: (Open POS) v (verb), adv (adverb), adj (adjective), n (common noun), propn (proper noun) (Closed POS) pron (pronoun), conj (conjunction), prep (preposition or postpositional), cardn (cardinal number), ordn (ordinal number), det (determiner), quant (quantifier), interj (interjection), contrc (contraction or portmanteaux), symb (symbol). These POS have been used successfully across many different language families and there are existing grammars and lexicons based upon these POS. This set provides sufficient high-level distinctions which are grammatically meaningful and universal across languages. Any POS can be subdivided using language-specific grammatical features. Issue R106-4 I recommend that the same phoneme normalization guidelines be adopted. I am pleased that you would accept this proposal, although you did not communicate the modifications that you envision. The issue is that the IPA itself does not layout specific guidelines for the use of IPA symbols in specifying phonemes; it is primarily geared toward allophones. Consequently there are no guidelines when linguists attempt to choose a single IPA symbol as the phoneme which represents a set of allophones. Simply specifying that IPA symbols are used is not adequate by itself to assure consistency for the representation of phonemes. This is why it is so common for various publications on the identical language to have different symbolic representations of the phonemes. The problem is unique to the representation of phonemes; the representation of allophones is far less ambiguous. The latest SLAPI specification contains a set of 3 guidelines for selecting which IPA symbol (and suprasegmentals) to use. This is not a modification of the IPA. It is simply a normalization of how to select an IPA symbol for a phoneme. The same guidelines will work for other phonetic symbol sets as well. 1) No modifier should be used in a phoneme symbol which does not constitute a phonemic contrast for the phoneme in that language. 2) When a phoneme is phonetically rendered in allophones involving different base symbols, the symbol chosen to represent the phoneme should be the one from which the others are arguably derived by phonological rule. 3) The phoneme symbol must be unique within the particular language. By adopting these three normalization guidelines, it is more likely that lexicons for the identical language will use consistent notations. Issue R106-3 I recommend that a recommendation be made for the normalization of these ambiguous and obsolete IPA symbols within the PLS. I believe that the IPA is the best developed standard and the most widely adopted. But there are a couple problems which significantly complicate the processing of IPA symbols. 1. Obsolete IPA symbols are still used frequently. 2. The IPA standard occasionally allows two different symbols to represent the identical sound. The first is not actually a problem with the IPA standard itself. In the PLS proposal there is an example which purports to be in IPA but uses a composed symbol which actually is no longer part of the IPA standard. Either the symbol (IPA #214, Section 4.6 1st example "huge") should be removed from the example, or there should be a notation that this is actually a proprietary alphabet - not IPA. We should not encourage people to deviate from IPA standard; this only exacerbates the complexities of supporting it. The fact that the standards document uses an obsolete symbol highlights the complexities associated with transcription standards. Many will continue to use obsolete symbols unaware (there are about 28 such symbols in IPA). A further complication is that the most commercial rendering engines will not correctly render several properly coded IPA symbols (those involving #509) so many people resort to non-standard, pre-composed representations. The document should use only standard symbols and highlight the problem of obsolete symbols. The IPA encodings are ambiguous; there are alternate acceptable means to represent several allophones (e.g. #210 is equivalent to #110). The ambiguities make string comparisons complex. If the committee does not wish to recommend a preferred coding, they should at least point out that the IPA coding is ambiguous so that users will properly implement comparisons involving equivalencies. If users are not aware of the ambiguities, many will attempt to use standard Unicode string comparison functions. These will not work reliably for the ambiguous symbols. Issue R106-2 I recommend that both a phonemic key and a graphemic key be included for each language in the lexicon, and that these keys be required. Lexicon integrity checks are vitally important. There are several ways that integrity checks are preformed on modern lexicons. The PLS document does not mention integrity issues. Each language has a defined alphabet of grapheme symbols and a defined set of phonemes. A very simple, effective integrity check is to specify these sets up front, within the lexicon. This is existing technology and has been used in lexicons for more than a decade. If PLS would optionally allow these keys to be imbedded, these checks could be done automatically. Issue R106-1 Thus, I recommend that the specification be explicit that the transcriptions are to be phonemic, not phonetic, and that this be required. I understand that you have already rejected this proposal. It is impossible to properly process a sequence without knowing whether phonemes or allophones are represented. If the committee does not wish to specify that they are phonemes, then they should at least allow the lexicon author to specify what he has chosen to represent - phonemes or allophones. During speech synthesis, phonological rules must be applied to the sequence of phonemes which represent a word. Some of these rules apply only within a word; other rules apply both within a word and across word boundaries. The phonological rules are ordered and must be applied in that order. These rules transform phonemes into allophones. (During speech recognition the processing is inverted.) The simplest case is that the lexicon contains phonemes (this is the norm). In this situation, the phonological rules are simply applied to the concatenated sequence of words. If the lexicon contains allophones, the processing is far more complex. First, the process must be reversed to find the underlying phonemes, then the process can be run forward. If we do find the underlying phonemes first, then the phonological rules could be applied in the wrong order and complex mistakes are made at the word boundaries. This is not just an academic issue. Without the proper phonologic processing significant errors will occur in both recognition and synthesis. It is a very practical problem that we encounter regularly in development of VoiceXML applications. Consistency with SSML I agree that the PLS should maintain compatibility with SSML. The issue is what is to be represented at what stage - phonemes or phones (i.e. allophones). Remember the lexicon pronunciations will serve as input to a synthesis processor; the output of the synthesis processor is SSML. As pointed out in SSML 1.0 section 3.1.9, phonemes are the unrealized form; "phones represent realized distinctions". I agree. The difference between the two is the processing by phonologic rules. A sentence is not simply the concatenation of the phonemes of the constituent words. This is linguistically incorrect. In English, the phonemes are transformed into allophones through the application of 33 phonologic rules. Moreover, some phoneme pairs will be coalesced into a single allophone. The symbols on the input side of the synthesis processor cannot be the same as the symbols on the output. They must represent distinct entities. This is not an implementation issue. This is a fundamental linguistic issue. You point out that the same XML element is used in both specs (<phoneme>). I would still allow the element to represent either phonemes or allophones, I would simply additionally allow the author of the lexicon to specify that for his purposes he has specified phonemes. This is not an issue of which alphabet is used since IPA can represent either phonemes or allophones. The difficultly is that with IPA it is impossible to know which is represented.
Received on Tuesday, 27 February 2007 23:20:55 UTC