Re: [media] how to support extended text descriptions

Hi Masatomo,

I think you have some very valid points here. There are definitely
some bugs that we will have to register. Let me go through it inline.

On Tue, Jun 7, 2011 at 6:30 PM, Masatomo Kobayashi <MSTM@jp.ibm.com> wrote:
> Let me mention four additional points about this topic mainly based on my
> experience in studying textual descriptions.
> The first one might have small impacts on the HTML5 spec while the others
> are just a suggestion for a11y APIs and the implementation of browsers and
> ATs.
>
> First, I found that the cue processing algorithm in the current working
> draft seems not to be able to handle zero-duration cues.
> A cue whose start time is equal to the end time never get into "current
> cues", which is defined as the list of cues "whose start times are less
> than or equal to the current playback position and whose end times are
> greater than the current playback position".
> Given that typical extended descriptions pause the playback all the while
> presenting a description as shown in the WGBH's sample Janina already
> mentioned, which will need a zero-duration cue, the algorithm should take
> care of this case.
> A simple (but possibly problematic) modification might be adding something
> like "whose start times are less than or equal to the current playback
> position and whose end times are greater than the *previous* playback
> position, if the time was reached through the usual monotonic increase of
> the current playback position during normal playback".

I think you are right: we need to allow cues of zero duration. I
wouldn't think, though, that negative duration would be allowed. So, I
think we just need to ask for that little change.


> Second, I am concerned that the use of pause() or pauseOnExit might cause
> an unwanted result when the user tries to actively pause the playback
> (e.g., clicking on the "pause" button) while the playback is temporarily
> paused by the AT, which I think is a common situation.
> In this case the intended result would be pausing both video and
> description, but the actual result would be resuming the video without
> pausing the description.

I don't actually think that would be the case. I envisage it to work
like this: if a video is paused, it will pause the video but not clear
the pauseOnExit flag. Also, a video pause() does not influence the AT
- it would require to be paused or skipped separately. When the video
is resumed (i.e. play() is called), then AT will also resume reading
out all the active cues. That means that the currently active cue will
be sent to AT again to allow users to catch up on where they are at.

In particular: the enabled state of the description track would not be
influenced by a video pause().


> Setting playbackRate = 0 or using a special state like "stalled" would
> have a better behavior.

A zero playbackrate just changes the playback speed of the video, but
will not call any of the callbacks that are necessary to be called
when a video pauses. Creating an artificial network stalling is also
not a good idea, because it causes stalled events etc to be fired. If
it is a proper user pause(), then it has to be dealt with it as such.
As mentioned: that has no effect on whether a text track is still
enabled or not.


> Third, even if the user has an option to decide whether the playback is
> paused or not when the TTS is still speaking at the end time, the author
> would also need to specify for each cue whether it is recommended to be
> paused or not, as described in our user requirements document.
> I wonder if the pause-on-exit flag might be exploited for this purpose.

The is an authoring need. The pauseOnExit flag is currently only a IDL
attribute and cannot be authored other than by script. The way to deal
with this requirement is to allow the user to skip still being read
out cues. If they really want to interrupt the screen reader and
continue a paused video, then this should be user controlled -
probably by a shortcut key to skip the current cue and unpause the
video.


> Fourth, given some modern TTS engines supporting SSML, I expect that with
> minimal effort and risk we can allow the author to improve the quality of
> narrations as mentioned in the user requirements document, simply by
> allowing including SSML in the cue text and exposing it to ATs via a11y
> API.
> For ATs not supporting SSML, the a11y API might need to expose both plain
> text and SSML.

Since this all goes through HTML, we have to find a way to map SSML
into HTML. I don't have a solution for this yet, but I think this is
also a problem that can still be solved once we have support for text
descriptions in browsers. It would be hard to get first
implementations of text descriptions if we overloaded the features at
this stage. That doesn't stop us from talking about it, but I don't
think this is very urgent yet.


> Overall I agree that enhancement of a11y APIs could allow native support
> of extended descriptions and this would be desired, benefiting both users
> and authors.

Yes, agreed.

Cheers,
Silvia.

Received on Wednesday, 8 June 2011 13:53:44 UTC