Notes for PLS from Kurt Fuqua on 2007-07-20 (www-voice@w3.org from July to September 2007)

From: Kurt Fuqua <kfuqua@vailsys.com>
Date: Fri, 20 Jul 2007 10:59:59 -0500
To: "Baggia Paolo" <paolo.baggia@loquendo.com>, "VoiceBrowser" <www-voice@w3.org>
Message-ID: <002901c7cae7$065b4790$1311d6b0$@com>

Paolo,

Here are the proposed forms for the 2 notes that you requested.

Kurt Fuqua

Dir of Speech Sciences, Vail Systems

Proposed Notes For W3C PLS

R106-4

When IPA symbols are used to represent the phonemes of a language, there can be an ambiguity concerning which allophonic symbol to select to represent a phoneme. This results in inconsistencies between lexicons which were composed for the identical language. In order to maximize consistency, we recommend following the guidelines found in the Scalable Language API.

1) No modifier should be used in a phoneme symbol which does not constitute a phonemic contrast for the phoneme in that language.

2) When a phoneme is phonetically rendered in allophones involving different base symbols, the symbol chosen to represent the phoneme should be the one from which the others are arguably derived by phonological rule.

3) The phoneme symbol must be unique within the particular language.

Here are some illustrative examples from English.

‘key’ = /ki/
The symbol /i/ is selected for the vowel rather than /i/. The rationale is that although the vowel is always long and therefore would be phonetically represented as /i/, length is never phonemically contrastive in English. Thus in accord with guideline #1, the length modifier is not used for the phonemic symbol.

‘synphony’ = /sInfoni/
Phonetically the first nasal should be represented by /ɱ/ rather than /n/. However the allophone is arguably derived from /n/ by phonologic rule therefore guideline #2 is invoked for the base symbol.

R106-3

There are some complications in the IPA standard which have significant implications for implementers. There are some sounds which can be represented by two different symbols. There are also several commonly used but technically obsolete symbols. Because of these ambiguities, simple string comparison functions are inadequate to perform comparisons of IPA symbols.

The most common ambiguity in IPA is for the voiced velar plosive – IPA symbol #110 which is used in many languages. IPA symbol #110 is represented by Unicode 0261 /ɡ/. However, IPA symbol #210 is equivalent to #110. IPA symbol #210 is represented by Unicode 0067 /g/. This very common substitution will cause simple string matches to fail.

There are more than two dozen symbols which were withdrawn, superseded, or not approved for IPA usage but are commonly used. Technically these are not part of the IPA and their use within an alphabet declared as IPA is discouraged. Implementers should be aware of these. The most commonly used obsolete symbols are the ligatures for affricates and most double articulations. These precomposed symbols are still part of the Unicode standard but are no longer part of the IPA standard. Instead according to the IPA standard these should be decomposed into discreet constituents and joined with a tie bar. For example, the precomposed affricate symbols /ʧ/ (02A7)and /ʤ/ (02A4) should be represented by /t͡ʃ/ (0074+0361+0283) and /d͡ʒ/ (0064+0361+0292) respectively. (See Handbook of the International Phonetic Association, Appendix 2) The tie bar #433 (Unicode 0361) is a non-spacing double diacritic. To further complicate matters, double diacritics are frequently improperly rendered by existing software.

Several IPA symbols involve multiple Unicode codepoints. IPA diacritics are non-spacing codepoints; some symbols could involve multiple diacritics. As with any composed Unicode symbols, normalization guidelines apply. Combining sequence IPA symbols should be normalized before comparisons are made. (See Unicode Normalization Forms UAX#15) Case is always significant for IPA symbols. The IPA suprasegmental length mark #503 (02D0) is a spacing codepoint but must be considered semantically part of the preceding vowel since it modifies it.

Received on Friday, 20 July 2007 15:55:40 UTC