Re: [media] how to support extended text descriptions

Hi Janina,

On Thu, Jun 9, 2011 at 1:01 AM, Janina Sajka <janina@rednote.net> wrote:
> Silvia Pfeiffer writes:
>> On Wed, Jun 8, 2011 at 9:30 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
>> > "Therefore, you cannot rely on the video making progress at the same time as the TTS engine ".
>> > Possibly not, however in the absence of trick play (which I think would have to cancel any descriptions), one can probably assume the video won't go *faster* than expected. Therefore if you set an internal handler for the assumed end time, then even if the video hasn't reached that point yet because it stalled, no real harm is done issuing a pause.
>>
>>
>> The TTS engine might go slower than expected (because it, too, may be
>> starved of CPU) and therefore the effect of the video going slower
>> than expected would still happen.
>>
>>
>> > " I do not know how to inform the browser or a JS when the screen reader has finished reading text in a cross-browser compatible way. "
>> > Do we need to is my point?
>>
>>
>> I think we do, since the TTS engine and the video player are two
>> processes that run asynchronously and therefore synchronisation is
>> necessary.
>>
>>
>> > " Descriptions delivered as audio do not come in the TextTrack. They come in the multitrack API. "
>> > That's arguing we shouldn't change the design because the design is wrong. To the end user they are both descriptions and serve the same purpose; the user doesn't care what markup tag caused them to come into existence.
>>
>> You're assuming that the current design is wrong. Let's analyse that
>> before making such an assumption.
>>
>> When we deal with text descriptions, they have to be voiced somehow.
>> This requires a TTS somewhere in the pipeline.
>> When we deal with audio descriptions, they come directly from the
>> video element and are thus a native part of the browser and not handed
>> through to a TTS.
>> I find it hard to see that it is possible to expose these two
>> fundamentally different types of content to the user in the same way.
>>
>> In particular: audio descriptions will go in sync with the video and
>> there is no need to pause the video to display them, while text
>> descriptions create the need for extensions of the timeline and the
>> pausing behaviour.
>>
>> I think they are inherently different and trying to fool the user into
>> thinking that they are identical will just lead to problems.
>>
> I think the only inherent difference that users will care about is
> whether or not the audio recording is of a real human reading the
> description or of a TTS engine voicing it. That's a kind we should be
> sure to capture.
>
> As has been noted, the TTS generated audio might be created realtime, or
> nonrealtime.
>
> If generated nonrealtime, it might be server generated and delivered as
> recorded audio, in which instance we should not conflate it by equating
> it with human naration in our tagging, even though it can effectively
> "play" the same way.

We have to take the position of the browser in interpreting the data.
When the server does TTS, then it will arrive in the browser as an
audio file and will be regarded as an audio track, just like any other
audio track. We have solved handling of these through the multitrack
API. Labelling is done by authors, which can then state whether it is
human narration or machine-generated. This is semantic and outside the
mechanics of how to deliver and display it.


> We need also to add to our use cases the situation where the user has
> opted to play timescale modified. Realtime TTS generation now needs to
> compute the actual available time, which is different from what's
> indicated for default playback rate. Please consider that this use case
> is not an edge case, as it will be frequently used by students in
> general.

Absolutely. This is why I am saying we cannot rely on the TTS to be
displayed in a certain timeframe, because the playback speed is user
controlled. Similarly, we cannot rely on the video playing at a
certain rate because the user may fast forward. We have to simply deal
with the reality that they are asynchronous real-time resources that
can only be synchronized through events and flags.


>> > "So you want them displayed as well as the captions? Always or only when they are also read out?
>
> As I've tried to suggest previously, we shouldn't assume the luxury of
> limiting how audio plus video display of alternative content might need
> to be combined. A couple of quick use cases:
>
> *       Low vision people FREQUENTLY want to SEE content as well as HEAR
> *       it. Comprehension is enhanced this way. This kind of feature is
> *       common to magnification software like ZoomText, and users will
> *       expect the same from media playback.
>
> *       Learning disabled users also benefit from seeing plus hearing.
> *       More elaborate support software for this population might
> *       highlight words as they're being spoken--a bit of a challenge,
> *       certainly.

Yes, I am very much aware of this.


> As to displaying both captions and descriptions, I suspect the users who
> will most want this aren't going to care about timelines or the actual
> video, because they can't see the video--or hear the audio.Thus,
> interleaved caption + description could well make sense.

I agree - they won't care about timelines. We have the @transcript
link for such users. Transcripts should contain both the full caption
text and the full description text and possibly more. So they are much
more useful than turning on captions and descriptions at the same
time.


> Lastly, let me reiterate we need more feedback from the wider WAI
> community on these use cases. It's good we're discussing this, but it's
> a bit premature to try and specify API requirements at this point, imho.

This group has spent a lot of time on gathering requirements and
asking other communities for input. Feel free to gather more and to
contribute it as we specify things. However, I would strongly advise
against waiting with technical discussions and specifications for the
time that we receive feedback: HTML5 development, browser development,
accessibility API changes - all of these are currently in flux and
it's easy to get solutions added. If we wait even a few months, we
might have missed the boat.

Regards,
Silvia.

Received on Thursday, 9 June 2011 02:05:06 UTC