W3C home > Mailing lists > Public > www-voice@w3.org > July to September 2007

Re: Notes for PLS

From: Kazuyuki Ashimura <ashimura@w3.org>
Date: Thu, 26 Jul 2007 00:14:47 +0900
Message-ID: <46A768E7.3080908@w3.org>
To: Kurt Fuqua <kfuqua@vailsys.com>
CC: Baggia Paolo <paolo.baggia@loquendo.com>, VoiceBrowser <www-voice@w3.org>

Dear Kurt Fuqua,

The working group will review the text and get back to you
shortly.

Sincerely,

Kazuyuki


Kurt Fuqua wrote:
>
> Paolo,
>
>  
>
> Here are the proposed forms for the 2 notes that you requested.
>
>  
>
> Kurt Fuqua
>
> Dir of Speech Sciences, Vail Systems
>
>  
>
> Proposed Notes For W3C PLS
>
>  
>
> R106-4
>
>  
>
> When IPA symbols are used to represent the phonemes of a language, 
> there can be an ambiguity concerning which allophonic symbol to select 
> to represent a phoneme.   This results in inconsistencies between 
> lexicons which were composed for the identical language.  In order to 
> maximize consistency, we recommend following the guidelines found in 
> the Scalable Language API.
>
>  
>
> 1) No modifier should be used in a phoneme symbol which does not constitute a phonemic contrast for the phoneme in that language.  
>    
> 2) When a phoneme is phonetically rendered in allophones involving different base symbols, the symbol chosen to represent the phoneme should be the one from which the others are arguably derived by phonological rule.  
>    
> 3) The phoneme symbol must be unique within the particular language.  
>    
> Here are some illustrative examples from English.  
>    
> ‘key’ = /ki/  
> The symbol /i/ is selected for the vowel rather than /i/.  The rationale is that although the vowel is always long and therefore would be phonetically represented as /i/, length is never phonemically contrastive in English.  Thus in accord with guideline #1, the length modifier is not used for the phonemic symbol.  
>    
> ‘synphony’ = /sInfoni/  
> Phonetically the first nasal should be represented by /ɱ/ rather than /n/.  However the allophone is arguably derived from /n/ by phonologic rule therefore guideline #2 is invoked for the base symbol.  
>
>  
>
>
>  
>
> R106-3
>
>  
>
> There are some complications in the IPA standard which have 
> significant implications for implementers.  There are some sounds 
> which can be represented by two different symbols.  There are also 
> several commonly used but technically obsolete symbols.  Because of 
> these ambiguities, simple string comparison functions are inadequate 
> to perform comparisons of IPA symbols.
>
>  
>
> The most common ambiguity in IPA is for the voiced velar plosive – IPA 
> symbol #110 which is used in many languages.  IPA symbol #110 is 
> represented by Unicode  0261 /ɡ/.  However,  IPA symbol #210 is 
> equivalent to #110.  IPA symbol #210 is represented by Unicode  0067 
> /g/.  This very common substitution will cause simple string matches 
> to fail.
>
>  
>
> There are more than two dozen symbols which were withdrawn, 
> superseded, or not approved for IPA usage but are commonly used.  
> Technically these are not part of the IPA and their use within an 
> alphabet declared as IPA is discouraged.  Implementers should be aware 
> of these.  The most commonly used obsolete symbols are the ligatures 
> for affricates and most double articulations.  These precomposed 
> symbols are still part of the Unicode standard but are no longer part 
> of the IPA standard.  Instead according to the IPA standard these 
> should be decomposed into discreet constituents and joined with a tie 
> bar.  For example, the precomposed affricate symbols /ʧ/ (02A7)and /ʤ/ 
> (02A4) should be represented by /t͡ʃ/ (0074+0361+0283) and /d͡ʒ/ 
> (0064+0361+0292) respectively.  (See Handbook of the International 
> Phonetic Association, Appendix 2)  The tie bar #433 (Unicode 0361) is 
> a non-spacing double diacritic.  To further complicate matters, double 
> diacritics are frequently improperly rendered by existing software.
>
>  
>
> Several IPA symbols involve multiple Unicode codepoints.  IPA 
> diacritics are non-spacing codepoints; some symbols could involve 
> multiple diacritics.  As with any composed Unicode symbols, 
> normalization guidelines apply.  Combining sequence IPA symbols should 
> be normalized before comparisons are made.  (See Unicode Normalization 
> Forms UAX#15)  Case is always significant for IPA symbols.  The IPA 
> suprasegmental length mark #503 (02D0) is a spacing codepoint but must 
> be considered semantically part of the preceding vowel since it 
> modifies it.
>
>  
>


-- 
Kazuyuki Ashimura / W3C MMI & Voice Activity Lead
mailto: ashimura@w3.org
voice: +81.466.49.1170 / fax: +81.466.49.1171
Received on Wednesday, 25 July 2007 15:14:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 July 2007 15:14:33 GMT