Re: Vail Comments on PLS Draft

Dear Kurt Fuqua,

I'd like to thank you again for the interesting comments you sent on November 10, 2006 [1]. We are in advanced stage for PLS 1.0, because this is the second edition of a Last Call Working Draft. I added this note to help you to understand the context were we evaluated your proposals.

The PLS subgroup discussed your comments. To simplify the tracking we broke your email into 5 comments numbered from
R106-1 to R106-5.

See below for each resolution and the motivation that guided us to reach the resolution.

Please indicate by email whether you are satisfied with the VBWG's resolutions, whether you think there has been a misunderstanding, or whether you wish to register an objection.
Paolo Baggia, editor PLS spec.



Issue R106-1 (Clarification / Typo / Editorial) From Kurt Fuqua (2006-11-20):

The document does not explicitly state whether the pronunciations in the lexicon are to be phonemic or phonetic. The difference is significant. The tag names (<phoneme>) and the examples in the document imply that the representation is phonemic. In an example (4.6), the word "huge" is transcribed using the phonemic transcription "hju??" rather than the phonetic transcription "Áju??".

I believe that all the pronunciations should be given phonemically rather than phonetically. First, a phonetic transcription is generally beyond the capability of those not trained in linguistics. Even a trained linguist would have a difficult time creating consistent phonetic transcriptions. There is a second and far more compelling theoretical reason why it has been standard practice for lexicons to be transcribed phonemically. Phonology rules need to be applied to a phonemic transcription in order to render a phonetic transcription for a sentence to be synthesized or spoken. This requires the underlying phoneme representation. If the phoneme representation were not given, one would first have to work backwards to determine the phonemic transcription before the phonology rules could be applied. Several of the phonology rules apply across word boundaries. Therefore a phonetic transcription of individual words is counter-productive.

Thus, I recommend that the specification be explicit that the transcriptions are to be phonemic, not phonetic, and that this be required.

*** Resolution: Rejected

This element is the same as that used in SSML 1.0, where the element may be used with either phonemic or phonetic pronunciation alphabets. The default alphabet (IPA) is both but mainly phonetic. However, synthesis processors may support alternative pronunciation alphabets that are phonetic, phonemic, syllable stress marking, or any other kind of alphabet that is useful for enhancing low-level descriptions of acoustic details. We believe it is important to permit this variety in pronunciation alphabets to support the expected uses of PLS and SSML.

This motivation may not be clear from the existing text. We will make this motivation more explicit in the text if possible. If you have precise suggestion, it will be welcome.

Issue R106-2 (Feature Request)
>From Kurt Fuqua (2006-11-20):

Lexicons are notorious for containing inconsistent information. It is therefore very useful to include integrity checks within the lexicon. The integrity checks allow for automated consistency checking. For more than a decade, the lexicons of the Scalable Language API have used a phonemic key. The lexicon contains a phonemic key for each language of the lexicon. The phonemic key is simply a list of all the phonemes for that language. If any pronunciation contains a phoneme which is not in that phoneme set, there is a consistency error. The concept is very simple, and it catches many errors immediately after an edit. Analogously there is also a grapheme key; this contains every grapheme used by that language. There are several other integrity checks possible and SLAPI implements most of them. For the sake of brevity, I will emphasize only these two keys.

I recommend that both a phonemic key and a graphemic key be included for each language in the lexicon, and that these keys be required.

*** Resolution: Deferred

We think your proposal is very interesting but beyond our current scope. 
We will be happy to reconsider this request for a future version of PLS.

Issue R106-3 (Clarification / Typo / Editorial) From Kurt Fuqua (2006-11-20):

The IPA is a well developed and useful representation. However it does contain some significant ambiguities. I believe that the standard should recommend certain normalized forms.

Several consonant phonemes can be represented using alternate symbols under the official guidelines. This ambiguity means that comparing IPA symbols becomes quite complex. For example, the very common IPA symbol #110 is represented as /g/ (x0261) and is logically equivalent to #210 /g/ (x0067). There many such ambiguous IPA symbols. There are also several obsolete IPA symbols which are still frequently used (e.g. #214). (This obsolete symbol is even included as an example in the draft.)

I recommend that a recommendation be made for the normalization of these ambiguous and obsolete IPA symbols within the PLS.

*** Resolution: Rejected

We are aware that IPA contains ambiguities. This WG has neither the experience nor the mandate to standardize more precisely than IPA has done. Note that the ability for vendors to support alternative pronunciation alphabets can significantly mitigate this problem.

Issue R106-4 (Clarification / Typo / Editorial) From Kurt Fuqua (2006-11-20):

As its name implies, the IPA was created primarily as a phonetic alphabet, not a phonemic alphabet. It can be used for the representation of phonemes but unfortunately linguists often transcribe the identical language with slightly different IPA symbols and diacritics. There should be recommendations to normalize the transcription of phonemes using IPA symbols.

The central problem is that a phoneme is a set of allophones. IPA can transcribe allophones in a way that is generally unambiguous. The difficulty is selecting which symbol to use to represent the set. For example, some linguists would transcribe the English word 'heat' as /hi?t/ others would use /hit/. In English, vowel length is not contrastive, although the vowel /i/ is always long. The question is whether to include a diacritic, or suprasegmental such as length, if that feature is not contrastive in the language. This issue was resolved under SLAPI with the following three normalization guidelines:

1) No modifier should be used in a phoneme symbol which does not constitute a phonemic contrast for the phoneme in that language.

2) When a phoneme is phonetically rendered in allophones involving different base symbols, the symbol chosen to represent the phoneme should be the one from which the others are arguably derived by phonological rule.

3) The phoneme symbol must be unique within the particular language.

While these do not resolve every issue of phonemic representation, they do resolve most such issues and allow for a standard normalization.

I recommend that the same phoneme normalization guidelines be adopted.

*** Resolution: Accepted (w/modifications)

In general we are not phoneticians, but we take advantage of groups like IPA. Other groups can do a better job.

Would you suggest an URI that refers to these information?

If you provide that we can review and possibly add an informative note.

Issue R106-5 (Feature Request)
>From Kurt Fuqua (2006-11-20):

I understand that a decision was made to not include part of speech information. I think that the lack of this most basic form of grammatical information will fundamentally handicap the standard. Part of speech information is used to differentiate pronunciations. English and other languages have many pairs of words that are pronounced differently. The draft includes such an example (4.9.3 example 2). The lexicon is the source of all word-level information for the applications. Without part of speech information in the lexicon, there is simply no way to differentiate which pronunciation to use.

I do recognize that this introduces another level of complexity in that the specification must include the possible parts of speech. However this has been addressed in other existing standards such as LexiconXML, the Scalable Language API and OSIS.

I recommend that part of speech information be included as a tag.

*** Resolution: Accepted (w/modifications)

We partially accepted your proposal to add an attribute in the PLS as a way of uniquely matching homographs to pronunciations.

This new attribute is called "role" and it is discussed in Section 4.4 [1].
It can address the problem to assign Part of Speech to specific lexemes.
Inside the SSML 1.1 effort there is a correlated activity to add the role attribute in SSML to allow the selection of a more specific lexeme, if needed.



Gruppo Telecom Italia - Direzione e coordinamento di Telecom Italia S.p.A.

This message and its attachments are addressed solely to the persons above and may contain confidential information. If you have received the message in error, be informed that any use of the content hereof is prohibited. Please return it immediately to the sender and delete the message. Should you have any questions, please send an e_mail to <> Thank you<>

Received on Thursday, 4 January 2007 16:22:01 UTC