Re: [media] how to support extended text descriptions from Masatomo Kobayashi on 2011-06-15 (public-html-a11y@w3.org from June 2011)

From: Masatomo Kobayashi <MSTM@jp.ibm.com>
Date: Wed, 15 Jun 2011 17:58:24 +0900
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: "public-html-a11y@w3.org" <public-html-a11y@w3.org>, public-html-a11y-request@w3.org
Message-ID: <OF278175B4.20DA34A7-ON492578B0.002D4D54-492578B0.00314C65@jp.ibm.com>
Hi Silvia,

Maybe I have some misunderstanding of the spec or the implementation you 
expected, but for the second problem, my concern is as follows:

1. When voicing has not finished by the end time of a cue, the video is 
paused as the cue's pause-on-exit flag is true.

2. Then the user clicks on the pause button to pause the video, but this 
will resume the video unexpectedly as the video is already paused and the 
button is a toggle button.

This will confuse the user.
As when the TTS pauses/resumes the video is unpredictable for the user, it 
is difficult to properly choose browser's pause or TTS's pause in time.

Perhaps this problem will be alleviated if the "pause" interface is not a 
toggle, at least in the keyboard control?

Regards,
Masatomo

Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote on 2011/06/08 22:52:55:
> 
> Hi Masatomo,
> 
> I think you have some very valid points here. There are definitely
> some bugs that we will have to register. Let me go through it inline.
> 
> On Tue, Jun 7, 2011 at 6:30 PM, Masatomo Kobayashi <MSTM@jp.ibm.com> 
wrote:
> > Let me mention four additional points about this topic mainly based on 
my
> > experience in studying textual descriptions.
> > The first one might have small impacts on the HTML5 spec while the 
others
> > are just a suggestion for a11y APIs and the implementation of browsers 
and
> > ATs.
> >
> > First, I found that the cue processing algorithm in the current 
working
> > draft seems not to be able to handle zero-duration cues.
> > A cue whose start time is equal to the end time never get into 
"current
> > cues", which is defined as the list of cues "whose start times are 
less
> > than or equal to the current playback position and whose end times are
> > greater than the current playback position".
> > Given that typical extended descriptions pause the playback all the 
while
> > presenting a description as shown in the WGBH's sample Janina already
> > mentioned, which will need a zero-duration cue, the algorithm should 
take
> > care of this case.
> > A simple (but possibly problematic) modification might be adding 
something
> > like "whose start times are less than or equal to the current playback
> > position and whose end times are greater than the *previous* playback
> > position, if the time was reached through the usual monotonic increase 
of
> > the current playback position during normal playback".
> 
> I think you are right: we need to allow cues of zero duration. I
> wouldn't think, though, that negative duration would be allowed. So, I
> think we just need to ask for that little change.
> 
> 
> > Second, I am concerned that the use of pause() or pauseOnExit might 
cause
> > an unwanted result when the user tries to actively pause the playback
> > (e.g., clicking on the "pause" button) while the playback is 
temporarily
> > paused by the AT, which I think is a common situation.
> > In this case the intended result would be pausing both video and
> > description, but the actual result would be resuming the video without
> > pausing the description.
> 
> I don't actually think that would be the case. I envisage it to work
> like this: if a video is paused, it will pause the video but not clear
> the pauseOnExit flag. Also, a video pause() does not influence the AT
> - it would require to be paused or skipped separately. When the video
> is resumed (i.e. play() is called), then AT will also resume reading
> out all the active cues. That means that the currently active cue will
> be sent to AT again to allow users to catch up on where they are at.
> 
> In particular: the enabled state of the description track would not be
> influenced by a video pause().
> 
> 
> > Setting playbackRate = 0 or using a special state like "stalled" would
> > have a better behavior.
> 
> A zero playbackrate just changes the playback speed of the video, but
> will not call any of the callbacks that are necessary to be called
> when a video pauses. Creating an artificial network stalling is also
> not a good idea, because it causes stalled events etc to be fired. If
> it is a proper user pause(), then it has to be dealt with it as such.
> As mentioned: that has no effect on whether a text track is still
> enabled or not.
> 
> 
> > Third, even if the user has an option to decide whether the playback 
is
> > paused or not when the TTS is still speaking at the end time, the 
author
> > would also need to specify for each cue whether it is recommended to 
be
> > paused or not, as described in our user requirements document.
> > I wonder if the pause-on-exit flag might be exploited for this 
purpose.
> 
> The is an authoring need. The pauseOnExit flag is currently only a IDL
> attribute and cannot be authored other than by script. The way to deal
> with this requirement is to allow the user to skip still being read
> out cues. If they really want to interrupt the screen reader and
> continue a paused video, then this should be user controlled -
> probably by a shortcut key to skip the current cue and unpause the
> video.
> 
> 
> > Fourth, given some modern TTS engines supporting SSML, I expect that 
with
> > minimal effort and risk we can allow the author to improve the quality 
of
> > narrations as mentioned in the user requirements document, simply by
> > allowing including SSML in the cue text and exposing it to ATs via 
a11y
> > API.
> > For ATs not supporting SSML, the a11y API might need to expose both 
plain
> > text and SSML.
> 
> Since this all goes through HTML, we have to find a way to map SSML
> into HTML. I don't have a solution for this yet, but I think this is
> also a problem that can still be solved once we have support for text
> descriptions in browsers. It would be hard to get first
> implementations of text descriptions if we overloaded the features at
> this stage. That doesn't stop us from talking about it, but I don't
> think this is very urgent yet.
> 
> 
> > Overall I agree that enhancement of a11y APIs could allow native 
support
> > of extended descriptions and this would be desired, benefiting both 
users
> > and authors.
> 
> Yes, agreed.
> 
> Cheers,
> Silvia.
Received on Wednesday, 15 June 2011 08:58:57 UTC