Re: Using HTML5 and Javascript to Deliver Text-Based Audio Descriptions from Geoff Freed on 2012-02-10 (public-texttracks@w3.org from February 2012)

From: Geoff Freed <geoff_freed@wgbh.org>
Date: Fri, 10 Feb 2012 17:02:50 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: "public-texttracks@w3.org" <public-texttracks@w3.org>
Message-ID: <CB5AA24A.E172%geoff_freed@wgbh.org>

Hi, Silvia:

In our tests we didn't have problems with latency with the pre-recorded
descriptions, but please let me know if you think you're hearing
descriptions that are late.  Since all this is being done server-side,
there's always the possibility that slow connections or congestion will
cause a delay.  Note that in most of the demos, the descriptions shouldn't
step on the program audio.  There are a few exceptions:  in the Ming clip
we have to dodge the chef's non-stop monolog (we specifically included
this clip to show how the use of ducking can aid in situations where you
must sacrifice some program audio); in Sintel, we play descriptions over
the fight scene.

I agree that client-side processing is definitely worth some research and
experimentation. 

Geoff/NCAM

On 2/9/12 4:35 PM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

>Hi Geoff,
>
>That is indeed very interesting. I'd be curious how you're going with
>the pre-recorded pieces and the download speed - is it fast enough? My
>suspicion is that doing the synthesis on the client will lead to a
>much more responsive system, but it'd be good to get that confirmed
>with actual experiments.
>
>Regards,
>Silvia.
>
>On Thu, Feb 9, 2012 at 11:46 PM, Geoff Freed <geoff_freed@wgbh.org> wrote:
>>
>> Hello, everybody:
>>
>> IBM-Research Tokyo recently partnered with the Carl and Ruth Shapiro
>>Family
>> National Center for Accessible Media (NCAM) at WGBH to research ways to
>> deliver online audio descriptions using  text-to-speech (TTS) methods.
>>IBM
>> and NCAM explored two approaches which exploit new HTML5 media elements,
>> Javascript and TTML:
>>
>> -- Writing and time-stamping a description script, then delivering the
>> descriptions as hidden text in real time in such a way that a user's
>>screen
>> reader will read them aloud. The descriptions remain otherwise
>>invisible and
>> inaudible to non-screen-reader users.
>> -- Writing and time-stamping descriptions, then recording them using TTS
>> technology. At the time of playback, each description is individually
>> retrieved and played aloud at intervals corresponding to the
>>time-stamped
>> script.
>>
>> Visit http://ncamftp.wgbh.org/ibm/dvs/ to learn more about the project,
>>view
>> the demonstration models and download the code to see how it works.
>>
>> Thanks.
>> Geoff Freed
>> WGBH/NCAM
>> (with apologies for cross-posts)
>>

Received on Friday, 10 February 2012 17:03:29 UTC