Vail Reply to Comments on PLS from Kurt Fuqua on 2007-02-27 (www-voice@w3.org from January to March 2007)

From: Kurt Fuqua <kfuqua@vailsys.com>
Date: Tue, 27 Feb 2007 17:30:04 -0600
To: "VoiceBrowser" <www-voice@w3.org>
Cc: "Baggia Paolo" <paolo.baggia@loquendo.com>
Message-ID: <001201c75ac7$359f1bc0$a0dd5340$@com>
Thank you for the opportunity to reply to your reaction to my comments.

It is my sincere hope that the PLS can be made to be excellent.

 

Kurt Fuqua

Director of Speech Sciences, Vail Systems, Inc.

 

Issue R106-5

I recommend that part of speech information be included as a tag.

 

I am pleased to hear that you partially accept the proposal to include part
of speech information in entries.  You have referred me to the 'roles' of
Section 4.4.  It is important that the parts of speech (POS) be standardized
so that lexicons can be merged or distributed.  This is not a trivial task
but it has been done well in other standards.  I would encourage the use of
the parts of speech which are used under the SLAPI standard.  These parts of
speech are:

 

(Open POS)

v (verb), adv (adverb), adj (adjective), n (common noun), propn (proper
noun)

 

(Closed POS)

pron (pronoun), conj (conjunction), prep (preposition or postpositional),
cardn (cardinal number), ordn (ordinal number), det (determiner), quant
(quantifier), interj (interjection), contrc (contraction or portmanteaux),
symb (symbol).

 

These POS have been used successfully across many different language
families and there are existing grammars and lexicons based upon these POS.
This set provides sufficient high-level distinctions which are grammatically
meaningful and universal across languages.  Any POS can be subdivided using
language-specific grammatical features.

 

 

Issue R106-4

I recommend that the same phoneme normalization guidelines be adopted.

 

I am pleased that you would accept this proposal, although you did not
communicate the modifications that you envision.  The issue is that the IPA
itself does not layout specific guidelines for the use of IPA symbols in
specifying phonemes; it is primarily geared toward allophones.  Consequently
there are no guidelines when linguists attempt to choose a single IPA symbol
as the phoneme which represents a set of allophones.  Simply specifying that
IPA symbols are used is not adequate by itself to assure consistency for the
representation of phonemes.  This is why it is so common for various
publications on the identical language to have different symbolic
representations of the phonemes.  The problem is unique to the
representation of phonemes; the representation of allophones is far less
ambiguous.  The latest SLAPI specification contains a set of 3 guidelines
for selecting which IPA symbol (and suprasegmentals) to use.  This is not a
modification of the IPA.  It is simply a normalization of how to select an
IPA symbol for a phoneme.  The same guidelines will work for other phonetic
symbol sets as well.

 

1) No modifier should be used in a phoneme symbol which does not constitute
a phonemic contrast for the phoneme in that language.
 
2) When a phoneme is phonetically rendered in allophones involving different
base symbols, the symbol chosen to represent the phoneme should be the one
from which the others are arguably derived by phonological rule.
 
3) The phoneme symbol must be unique within the particular language.
 

By adopting these three normalization guidelines, it is more likely that
lexicons for the identical language will use consistent notations.

 

 

Issue R106-3

I recommend that a recommendation be made for the normalization of these
ambiguous and obsolete IPA symbols within the PLS.

 

I believe that the IPA is the best developed standard and the most widely
adopted.  But there are a couple problems which significantly complicate the
processing of IPA symbols.

1.    Obsolete IPA symbols are still used frequently.

2.    The IPA standard occasionally allows two different symbols to
represent the identical sound.

 

The first is not actually a problem with the IPA standard itself.  In the
PLS proposal there is an example which purports to be in IPA but uses a
composed symbol which actually is no longer part of the IPA standard.
Either the symbol (IPA #214, Section 4.6 1st example "huge") should be
removed from the example, or there should be a notation that this is
actually a proprietary alphabet - not IPA.  We should not encourage people
to deviate from IPA standard; this only exacerbates the complexities of
supporting it. The fact that the standards document uses an obsolete symbol
highlights the complexities associated with transcription standards.  Many
will continue to use obsolete symbols unaware (there are about 28 such
symbols in IPA).  A further complication is that the most commercial
rendering engines will not correctly render several properly coded IPA
symbols (those involving #509) so many people resort to non-standard,
pre-composed representations.  The document should use only standard symbols
and highlight the problem of obsolete symbols.
 
The IPA encodings are ambiguous; there are alternate acceptable means to
represent several allophones (e.g. #210 is equivalent to #110).  The
ambiguities make string comparisons complex.  If the committee does not wish
to recommend a preferred coding, they should at least point out that the IPA
coding is ambiguous so that users will properly implement comparisons
involving equivalencies.  If users are not aware of the ambiguities, many
will attempt to use standard Unicode string comparison functions.  These
will not work reliably for the ambiguous symbols.
 
 

Issue R106-2

I recommend that both a phonemic key and a graphemic key be included for
each language in the lexicon, and that these keys be required.

 

Lexicon integrity checks are vitally important.  There are several ways that
integrity checks are preformed on modern lexicons.  The PLS document does
not mention integrity issues.  Each language has a defined alphabet of
grapheme symbols and a defined set of phonemes.  A very simple, effective
integrity check is to specify these sets up front, within the lexicon.  This
is existing technology and has been used in lexicons for more than a decade.
If PLS would optionally allow these keys to be imbedded, these checks could
be done automatically.

 

 

Issue R106-1

Thus, I recommend that the specification be explicit that the transcriptions
are to be phonemic, not phonetic, and that this be required.

 

I understand that you have already rejected this proposal.

It is impossible to properly process a sequence without knowing whether
phonemes or allophones are represented.  If the committee does not wish to
specify that they are phonemes, then they should at least allow the lexicon
author to specify what he has chosen to represent - phonemes or allophones.

 

During speech synthesis, phonological rules must be applied to the sequence
of phonemes which represent a word.  Some of these rules apply only within a
word; other rules apply both within a word and across word boundaries.  The
phonological rules are ordered and must be applied in that order.  These
rules transform phonemes into allophones.  (During speech recognition the
processing is inverted.)

 

The simplest case is that the lexicon contains phonemes (this is the norm).
In this situation, the phonological rules are simply applied to the
concatenated sequence of words.  If the lexicon contains allophones, the
processing is far more complex.  First, the process must be reversed to find
the underlying phonemes, then the process can be run forward.  If we do find
the underlying phonemes first, then the phonological rules could be applied
in the wrong order and complex mistakes are made at the word boundaries.

 

This is not just an academic issue.  Without the proper phonologic
processing significant errors will occur in both recognition and synthesis.
It is a very practical problem that we encounter regularly in development of
VoiceXML applications.

 

Consistency with SSML

I agree that the PLS should maintain compatibility with SSML.  The issue is
what is to be represented at what stage - phonemes or phones (i.e.
allophones).  Remember the lexicon pronunciations will serve as input to a
synthesis processor; the output of the synthesis processor is SSML.  As
pointed out in SSML 1.0 section 3.1.9, phonemes are the unrealized form;
"phones represent realized distinctions".  I agree.  The difference between
the two is the processing by phonologic rules.  A sentence is not simply the
concatenation of the phonemes of the constituent words.  This is
linguistically incorrect.  In English, the phonemes are transformed into
allophones through the application of 33 phonologic rules.  Moreover, some
phoneme pairs will be coalesced into a single allophone.  The symbols on the
input side of the synthesis processor cannot be the same as the symbols on
the output.  They must represent distinct entities.  This is not an
implementation issue.  This is a fundamental linguistic issue.

 

You point out that the same XML element is used in both specs (<phoneme>).
I would still allow the element to represent either phonemes or allophones,
I would simply additionally allow the author of the lexicon to specify that
for his purposes he has specified phonemes.  This is not an issue of which
alphabet is used since IPA can represent either phonemes or allophones.  The
difficultly is that with IPA it is impossible to know which is represented.
Received on Tuesday, 27 February 2007 23:20:55 UTC