W3C home > Mailing lists > Public > public-html-a11y@w3.org > June 2011

Re: [media] how to support extended text descriptions

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 8 Jun 2011 16:46:10 +1000
Message-ID: <BANLkTin9T-Pwx-9rqt4x7HRH6fh2pgSE=A@mail.gmail.com>
To: Sean Hayes <Sean.Hayes@microsoft.com>
Cc: "public-html-a11y@w3.org" <public-html-a11y@w3.org>
On Wed, Jun 8, 2011 at 9:30 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> "Therefore, you cannot rely on the video making progress at the same time as the TTS engine ".
> Possibly not, however in the absence of trick play (which I think would have to cancel any descriptions), one can probably assume the video won't go *faster* than expected. Therefore if you set an internal handler for the assumed end time, then even if the video hasn't reached that point yet because it stalled, no real harm is done issuing a pause.

The TTS engine might go slower than expected (because it, too, may be
starved of CPU) and therefore the effect of the video going slower
than expected would still happen.

> " I do not know how to inform the browser or a JS when the screen reader has finished reading text in a cross-browser compatible way. "
> Do we need to is my point?

I think we do, since the TTS engine and the video player are two
processes that run asynchronously and therefore synchronisation is

> " Descriptions delivered as audio do not come in the TextTrack. They come in the multitrack API. "
> That's arguing we shouldn't change the design because the design is wrong. To the end user they are both descriptions and serve the same purpose; the user doesn't care what markup tag caused them to come into existence.

You're assuming that the current design is wrong. Let's analyse that
before making such an assumption.

When we deal with text descriptions, they have to be voiced somehow.
This requires a TTS somewhere in the pipeline.
When we deal with audio descriptions, they come directly from the
video element and are thus a native part of the browser and not handed
through to a TTS.
I find it hard to see that it is possible to expose these two
fundamentally different types of content to the user in the same way.

In particular: audio descriptions will go in sync with the video and
there is no need to pause the video to display them, while text
descriptions create the need for extensions of the timeline and the
pausing behaviour.

I think they are inherently different and trying to fool the user into
thinking that they are identical will just lead to problems.

> "So you want them displayed as well as the captions? Always or only when they are also read out? What screen real estate are you expecting to use? Can you provide an example as a use case?"
> They would be presented as both captions and descriptions, so they are displayed when the user selects them in the caption menu and for their allotted duration. I'm expecting the author to determine the screen real estate exactly as they do for other captions. I demoed an example at the f2f if you recall. I'll check tomorrow whether it's still online.

Does selecting them in the captions menu automatically mean they have
to be shown on the screen? We have to be careful about the
consequences: we are just introducing two new state making it 4 states
that a audio description track can be in: off, on and voiced, on and
visible, on and visible and voiced. A single entry in a menu will now
not suffice any longer to select an audio description track. This
single change creates heaps of new complexity.

If an author really wants to display the text descriptions as text,
right now they would use some javascript to do so. Is that not
sufficient? Should we not wait and see how large the need for such a
feature is rather than jumping to conclusions on a feature that
doesn't exist anywhere else yet?

> "Screen readers provide the interface to the Braille devices."
> Screen readers are certainly the primary providers of text to a Braille device, but it's basically an output port; other processes, like the media subsystem, could potentially use it too. I don't think it's a given that we'd assume descriptions (which as you say aren't generally on the screen, and aren't in the DOM), should actually be read by a screen reader.

They are in the shadow dom and there is a JavaScript API for them.
They exist more in the page than other external content such as e.g.
picture, audio or video data.

> I am still not 100% on board with the idea that text track descriptions should be relying on the presence of a screen reader, since a SR is going to be doing a lot of other things related to navigation on the page. I'm not sure SR designers have even considered this use case.

Probably not yet. I am starting discussions on the IA2 mailing list to
see what people are thinking about it, since it would be there where
the most impact would be felt.

The issue is that SR and video playback have to interact
constructively. You can't just have them as completely separate
modules. The screen reader has control over an audio description track
of the video element - why should it not have control over a text
description track, too? Also, right now screen readers are the only
TTS engine we get for Web pages, so if we don't make use of them for
text descriptions, we can't do anything with text descriptions. What
alternative do we have?

Received on Wednesday, 8 June 2011 06:46:58 UTC

This archive was generated by hypermail 2.4.0 : Friday, 20 January 2023 19:59:02 UTC