Re: [media] how to support extended text descriptions

Let me mention four additional points about this topic mainly based on my 
experience in studying textual descriptions.
The first one might have small impacts on the HTML5 spec while the others 
are just a suggestion for a11y APIs and the implementation of browsers and 
ATs.

First, I found that the cue processing algorithm in the current working 
draft seems not to be able to handle zero-duration cues.
A cue whose start time is equal to the end time never get into "current 
cues", which is defined as the list of cues "whose start times are less 
than or equal to the current playback position and whose end times are 
greater than the current playback position".
Given that typical extended descriptions pause the playback all the while 
presenting a description as shown in the WGBH's sample Janina already 
mentioned, which will need a zero-duration cue, the algorithm should take 
care of this case.
A simple (but possibly problematic) modification might be adding something 
like "whose start times are less than or equal to the current playback 
position and whose end times are greater than the *previous* playback 
position, if the time was reached through the usual monotonic increase of 
the current playback position during normal playback".

Second, I am concerned that the use of pause() or pauseOnExit might cause 
an unwanted result when the user tries to actively pause the playback 
(e.g., clicking on the "pause" button) while the playback is temporarily 
paused by the AT, which I think is a common situation.
In this case the intended result would be pausing both video and 
description, but the actual result would be resuming the video without 
pausing the description.
Setting playbackRate = 0 or using a special state like "stalled" would 
have a better behavior.

Third, even if the user has an option to decide whether the playback is 
paused or not when the TTS is still speaking at the end time, the author 
would also need to specify for each cue whether it is recommended to be 
paused or not, as described in our user requirements document.
I wonder if the pause-on-exit flag might be exploited for this purpose.

Fourth, given some modern TTS engines supporting SSML, I expect that with 
minimal effort and risk we can allow the author to improve the quality of 
narrations as mentioned in the user requirements document, simply by 
allowing including SSML in the cue text and exposing it to ATs via a11y 
API.
For ATs not supporting SSML, the a11y API might need to expose both plain 
text and SSML.

Overall I agree that enhancement of a11y APIs could allow native support 
of extended descriptions and this would be desired, benefiting both users 
and authors.

Regards,
Masatomo

Received on Tuesday, 7 June 2011 08:31:14 UTC