Maybe Why Speech Synthesisers Are So Difficult To Get Used To from Will Pearson on 2004-12-28 (wai-xtech@w3.org from December 2004)

From: Will Pearson <will-pearson@tiscali.co.uk>
Date: Tue, 28 Dec 2004 16:18:15 -0000
To: <uvip@yahoogroups.com>, <wai-xtech@w3.org>
Message-ID: <007f01c4ecf8$daae0a20$c58b6051@WillPearson>

Hi;

I've recently been reading a paper that was published in the on-line journal, Nature Neuroscience. The subject of the paper was early language acquisition, and the cognitive and neurological processes that go on. The review of the paper is at:
http://www.nature.com/cgi-taf/Dynapage.taf?file=/nrn/journal/v5/n11/abs/nrn1533_fs.html
Unfortunately, you need a subscription to read the full article.

One of the conclusions was that we become conditioned to discriminate amongst the phonemes used in speech, based on the phonemes we are exposed to at an early age. It is still possible to learn to discriminate amongst phonemic blocks later in life, but this task is harder for adults than it is for infants.

Being used to discriminating between the phonemes used in our native languages, and the fact that learning to discriminate between different phonemes becomes increasingly difficult later in life makes foreign language acquisition harder for adults than infants. The task is not just to become conditioned to associating meaning with the different word sounds, but also to discriminate between the different phonemic blocks that a foreign language may use. For this reason, foreign languages are easier to learn if the speech rate of the speaker is slowed down, making discrimination between the phonemes easier.

This process of slowing down speech also occurs for new users of a TTS synthesiser, or users new to a different TTS synthesiser. One likely cause for this may be that TTS synthesisers haven't correctly replicated the phonemic blocks that we are used to in our native languages. Therefore, we have to learn to discriminate between phonemic blocks that are slightly different to those that we are used to. Once we have learnt to distinguish between the different phonemes, it is then possible to listen to synthetic speech at quite high rates.

This hypothesis, if true, poses several usability issues for certain groups. Firstly, the transfer of information for all TTS users is going to be significantly slower than human to human speech for a while, whilst the user's brain maps to the new phonemic blocks. Secondly, this remapping is based on exposure times. Those that are exposed to a TTS synthesiser more frequently and for longer durations will likely distinguish between the different phonemes quicker than those who only use a TTS engine infrequently, a possible problem for TTS use by sighted, but "eyes free", users.

Will

Received on Tuesday, 28 December 2004 16:16:05 UTC