Re: [media] how to support extended text descriptions from Janina Sajka on 2011-06-05 (public-html-a11y@w3.org from June 2011)

From: Janina Sajka <janina@rednote.net>
Date: Sat, 4 Jun 2011 20:29:44 -0400
To: public-html-a11y@w3.org
Message-ID: <20110605002944.GF6041@sonata.rednote.net>
Hi, Silvia:

Silvia Pfeiffer writes:
> ... What I tried to describe was a means for how we can solve the real use
> case that you are talking about and that we included into the
> accessibility requirements.
> 
> In fact, the rest of the email analyzes how we can provide extended
> descriptions. We already have the markup in HTML through the text
> track and the "descriptions" type. We now need to find a way in which
> it can work in practice rather than just theoretically hoping it will
> "just work".
> 
> I believe I've tracked the issue down to being one of the
> communication between the screen reader and the Web browser. I wanted
> validation of this thought process.
> 
> Please, let's not get side-tracked by a throw-away comment, but stay
> focused on the real issue here: how are we technically going to solve
> the extension problem.
> 
Sure.

I would suggest relying on the screen reader is asking for unnecessary
complications. They're not designed for interacting with anything that
moves through time.

I think there's a simpler way. Please bear with me for a moment and set
the screen reader aside. The problem is to get the texted description
voiced during such time as is available inbetween the spoken dialog
which is in the media resource. Expressed this way, there's actually no
functional difference between extended and "ordinary" texted
descriptions. In other words, by definition we know that extended
descriptions will require pausing the audio in order to allow time for
remaining descriptive text to be voiced. However, if the TTS rate is set
slow enough, this could well also be the functional result of trying to
squeeze a descriptive string inbetween segments of recorded audio.
Fortunately, I would propose both can be handled the same way.

What we need is a module that can do two things:

1,(	Track time--including how much time is available inbetween
segments of spoken dialog.


2.)	Intelligently pipe text to a TTS engine. The "intelligence"
relates to calculating how much time will be required to voice the text
that precedes the onset of the next spoken dialog.

Thus, if the time required is longer than that available in
	the media resource, pause the primary resource long enough for
	voicing to complete.

No screen reader does anything remotely like this. Certainly, some might
want to add the capability, but the capability could just as readily
come from an app that does only this task.

Note that it would be inappropriate to vary rate of TTS speech in order
to try and "squeeze" text into an available segment inbetween spoken
dialog.

Note also that the screen reader can be expected to have access to all
text on screen at any point.If the user is satisfied to rely on the
screen reader alone, pausing the media and using it to read what's on
screen is always an option. At times, I would expect users would pause
the app I've described in order to check the spelling of some term, for
instance. This is fully in keeping with what a screen reader can do
today. It's common to set a screen reader NOT to auto read text, yet
it's still able to voice as the user interactively "reads" acress the
screen word by word, or char by char.

Thus, the only remaining behavior to consider, is whether current
position is reset by user initiated screen reading activity. May I
suggest that typical screen reader functionality is again available to
help us answer that. It's the user's choice. In some cases the user will
want to resume from where playback was stopped, regardless of what the
user may have "read" with the screen reader. In other cases, the user
may choose to indicate "start from here," which is an option most modern
screen readers support. As we should expect to navigate the media
structure using the screen reader, this would be in keeping with
expected functionality, so the plugin app I proposed above needs to be
capable of receiving a new start point (earlier or later in the
timeline).

Janina


> Regards,
> Silvia.
> 
> On Sun, Jun 5, 2011 at 8:18 AM, Janina Sajka <janina@rednote.net> wrote:
> > This is incorrect. If the use case language you quote below is currently in the spec docs, it is
> > it dilutes and obfiscates the consensus User Requirements we created
> > some months back when we created and circulated our media accessibility
> > user requirements document.
> >
> > There is no accessibility use case relating to driving (a car, a train,
> > or any other vehicle). Inasmuch as numerous governmental entities have
> > begun criminalizing the simple acts of talking and texting on cell
> > phones, the far more complex activity of interacting with web content
> > seems to me only to beg the acceleration of legal interdicts.
> >
> > In any case, we don't need a made-up mainstream use case in order to legitimate
> > our very real a11y use case.  Nor is our nonspeculative use case for
> > extended textual descriptions specific to people who are blind. It is
> > also applicable to people with low vision or people with any of a range
> > of learning or cognitive disabilities,  as we explain in our User
> > Requirements document:
> >
> > http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Extended_video_descriptions
> >
> > Recall also that we have a demonstration video, courtessy of NCAM, of an
> > actual use of extended description in a MIT physics lecture.
> >
> > http://media.w3.org/2010/08/
> >
> > Furthermore, I assert our a11y use case does require support for markup,
> > for many of the same reasons we need markup in support of poster. Note
> > that the greater use of extended descriptions, whether recorded audio or
> > textual, is likely to be educational, so that multi-lingual vocabulary
> > is highly likely, as is subject-specific technical vocabulary. This
> > alone is sufficient reason for ml support, imho.
> >
> > These are the requirements we need to satisfy. If that also leads to the
> > enablement of a
> > generalized use case, well and good. But, not every a11y solution has
> > generalized application. Braille and sign language are unlikely ever to
> > attain general uptake, for instance. Yet both braille and sign language
> > are critically important to the people who need them
> >
> > The a11y use case is a real use case, in other words, requiring real
> > solutions in HTML 5, regardless of whether it ever leads to a
> > generalized application (or not).  May I suggest we focus on solving
> > real needs before we indulge in fantasies about screen readers in
> > vehicular instrument panels?
> >
> > Janina
> >
> > Silvia Pfeiffer writes:
> >> Hi all,
> >>
> >> I'm aware that we are currently focused on trying to sort out
> >> hierarchical navigation, but I also want to bring the extended text
> >> descriptions into the mix to make sure we understand how that can
> >> work.
> >>
> >> Ian and I have discussed recently how we would address this need and
> >> whether there would be a requirement for extra markup.
> >>
> >> I believe the below discussion is a good summary of how we envisage it
> >> to work. Please provide feedback and ask questions if it anything is
> >> unclear.
> >>
> >> With the below described approach (which we discussed at the WHATWG),
> >> there is no need for extra markup, but there is a requirement on the
> >> accessibility API between a screen reader and the browser. On that API
> >> we would need the possibility for the screen reader to influence how
> >> the player works.
> >>
> >> I suggest we need to talk to developers of browser accessibility APIs
> >> and also to screen reader developers to see what they say about it and
> >> whether this is technically realistic. I would encourage you to try
> >> and get such information.
> >>
> >> Looking forward to your comments.
> >>
> >> Cheers,
> >> Silvia.
> >>
> >>
> >> On Tue, 24 May 2011, Silvia Pfeiffer wrote:
> >> >
> >> > Ian and I had a brief conversation recently where I mentioned a problem
> >> > with extended text descriptions with screen readers (and worse still
> >> > with braille devices) and the suggestion was that the "paused for user
> >> > interaction" state of a media element may be the solution. I would like
> >> > to pick this up and discuss in detail how that would work to confirm my
> >> > sketchy understanding.
> >> >
> >> > *The use case:*
> >> >
> >> > In the specification for media elements we have a <track> kind of
> >> > "descriptions", which are:
> >> > "Textual descriptions of the video component of the media resource,
> >> > intended for audio synthesis when the visual component is unavailable
> >> > (e.g. because the user is interacting with the application without a
> >> > screen while driving, or because the user is blind). Synthesized as a
> >> > separate audio track."
> >> >
> >> > I'm for now assuming that the synthesis will be done through a screen
> >> > reader and not through the browser itself, thus making the
> >> > descriptions available to users as synthesized audio or as braille if
> >> > the screen reader is set up for a braille device.
> >> >
> >> > The textual descriptions are provided as chunks of text with a start
> >> > and a end time (so-called "cues"). The cues are processed during video
> >> > playback as the video's playback time starts to fall within the time
> >> > frame of the cue. Thus, it is expected the that cues are consumed
> >> > during the cue's time frame and are not present any more when the end
> >> > time of the cue is reached, so they don't conflict with the video's
> >> > normal audio.
> >> >
> >> > However, on many occasions, it is not possible to consume the cue text
> >> > in the given time frame. In particular not in the following
> >> > situations:
> >> >
> >> > 1. The screen reader takes longer to read out the cue text than the
> >> > cue's time frame provides for. This is particularly the case with long
> >> > cue text, but also when the screen reader's reading rate is slower
> >> > than what the author of the cue text expected.
> >> >
> >> > 2. The braille device is used for reading. Since reading braille is
> >> > much slower than listening to read-out text, the cue time frame will
> >> > invariably be too short.
> >> >
> >> > 3. The user seeked right into the middle of a cue and thus the time
> >> > frame that is available for reading out the cue text is shorter than
> >> > the cue author calculated with.
> >> >
> >> > Correct me if I'm wrong, but it seems that what we need is a way for
> >> > the screen reader to pause the video element from continuing to play
> >> > while the screen reader is still busy delivering the cue text. (In
> >> > a11y talk: what is required is a means to deal with "extended
> >> > descriptions", which extend the timeline of the video.) Once it's
> >> > finished presenting, it can resume the video element's playback.
> >> >
> >> > IIUC, a video is "paused for user interaction" basically when the UA has
> >> > decided to pause the video without the user asking to pause it (i.e. the
> >> > paused attribute is false) and the pausing happened not for network
> >> > buffering reasons, but for other reasons. IIUC one concrete situation
> >> > where this state is used is when the UA has reached the end of the
> >> > resource and is waiting for more data to come (e.g. on a live stream).
> >>
> >> Ian's comment:
> >> That latter state is not "paused for user interaction", it's just stalled
> >> due to lack of data. The rest is accurate though.
> >>
> >>
> >> > To use "paused for user interaction" for extending descriptions, we need
> >> > to introduce a means for the screen reader to tell the UA to pause the
> >> > video when it reaches the end of the cue and it's still busy delivering
> >> > a cue's text. Then, as it finishes, it will un-pause the video to let it
> >> > continue playing.
> >> >
> >> > To me it sounds like a feasible solution.
> >> >
> >> > The screen reader could even provide a user setting and a short-cut so a
> >> > user can decide that they don't want this pausing to happen or that they
> >> > want to move on from the current cue.
> >> >
> >> > Another advantage of this approach is that e.g. a deaf-blind user could
> >> > hook up their braille device such that it will deliver the extended
> >> > descriptions and also deliver captions through braille with such
> >> > extension pausing happening. (Not sure that such a user would even want
> >> > to play the video, but it would be possible.)
> >> >
> >> > Now, I think there is one problem though (at least as far as I can
> >> > tell). Right now, IIUC, screen readers are only passive listeners on the
> >> > UA. They don't influence the behaviour of the UA. The accessibility API
> >> > is basically only a one-way street from the UA to the AT. I wonder if
> >> > that is a major inhibitor of using this approach or whether it's easy
> >> > for UAs to overcome this limitation? (Or if such a limitation even
> >> > exists - I don't know enough about how AT work...).
> >> >
> >> > Is that an issue? Are there other issues that I have overlooked?
> >>
> >> Ian's comment:
> >> That seems to be entirely an implementation issue.
> >
> > --
> >
> > Janina Sajka,   Phone:  +1.443.300.2200
> >                sip:janina@asterisk.rednote.net
> >
> > Chair, Open Accessibility       janina@a11y.org
> > Linux Foundation                http://a11y.org
> >
> > Chair, Protocols & Formats
> > Web Accessibility Initiative    http://www.w3.org/wai/pf
> > World Wide Web Consortium (W3C)
> >
> >

-- 

Janina Sajka,	Phone:	+1.443.300.2200
		sip:janina@asterisk.rednote.net

Chair, Open Accessibility	janina@a11y.org	
Linux Foundation		http://a11y.org

Chair, Protocols & Formats
Web Accessibility Initiative	http://www.w3.org/wai/pf
World Wide Web Consortium (W3C)
Received on Sunday, 5 June 2011 00:30:10 UTC