W3C home > Mailing lists > Public > public-html-a11y@w3.org > October 2010

Re: Media--Additional Requirement for Sec. 2.6 Captioning?

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sat, 9 Oct 2010 06:19:15 +1100
Message-ID: <AANLkTikJ8SApvLMbj4S1781wyQ+ScTTYhf7s-gTeLf35@mail.gmail.com>
To: HTML Accessibility Task Force <public-html-a11y@w3.org>
On Fri, Oct 8, 2010 at 12:35 PM, Janina Sajka <janina@rednote.net> wrote:

> Conversations with the consumers of captions who were in attendance at
> the Open Subtitles Conference in New York City last week have exposed a
> potential additional requirement clause for CC-23, and have raised concerns
> regarding some of our explanatory text in Sec. 2.6 Captioning:
> http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements#Captioning
> 1.)     The additional requirement -----
> There was great concern about keeping captions synchronized with spoken
> dialog in primary media resources. Some participants even proposed
> schemes relying on tracking db variations in the primary media resource
> audio in order to resync captions.
> The several participants were in strong agreement that a realtime
> control to nudge captions forward (and backward) during media play would
> be very helpful. Their experience is that captions are frequently
> noticably out of sync, and that this is distracting. They referred to a
> media player which supports this today, but I've forgotten which--I'll
> have to grep the conference notes for that datum.

It's mplayer I think that allows to speed up or slow down caption display to
get rid of the skew.

> PROPOSED ADDITION: Add a clause to CC-23 so that CC-23 would now read:
> Ascertain that captions are displayed in sync with the media resource.
> Provide a realtime control to allow users to adjust the caption forward and
> back
> against the primary resource.

In my early experiments I had these features in my itext spec, see
https://wiki.mozilla.org/Accessibility/HTML5_captions_v2 with the

attribute float delay;
attribute unsigned long stretch;

The "delay" would allow specification of an initial offset
(positive/negative) of the external captions in relation to the main
resource and the stretch would allow to stretch time by a constant factor
(positive/negative) of the caption timing to fix up skew.

In discussions, also with input from Ian, it was determined that these are
artefacts of production of the caption file in relation to the video
resource and that they should be fixed *before* publishing the caption file
on a Web server, thus they would never need to be fixed on the Web page.

One has to understand that the use case for captions on the Web is here
different to the use case for offline. On the Web, the person that publishes
the captions would publish them with relation to a particular video
resource. Therefore, they can fix up the skew. Offline, the caption files
are published as a collection and people that download a movie go looking
for a caption file for that movie, then synchronise it through their player.
This is not what would happen on the Web. Therefore, it is probably indeed
unnecessary to have these parameters and this requirement.

I would therefore recommend at this stage not to introduce these
requirements. They may only add to the complexity of the solutions that we
need to introduce and I don't see there being a need. If it later turns out
that a lot of people have to fix up the timing of the media resource
manually with JavaScript (which can always be used to fix this), then we can
introduce it later. I would actually prefer if the publishers of caption
files would use a convenience tool instead that doesn't put this burden on
the Web browser.

2.)     Explanatory Text Questions
>        a.)     Open vs. Closed Captions
>        The relevance of open and closed captioning in an HTML 5 context
> proved confusing. Indeed, I don't believe we have a mandate to somehow
> specify either or both. Rather, either can and should be achieved by the
> consumer using their chosen client browser. If this is correct, the
> third and fourth sentences in our first paragraph are confusing people.
> They confused people in NYC last week.
> SUGGESTION: Rewrite sentences 3 & 4 of our first paragraph in Sec. 2.6
> to read:
> Historically, captions have been either closed or open. Closed captions
> have been transmitted as data along with the video but were not visible
> until the user elected to turn them on, usually by invoking an on-screen
> control or menu selection. Open captions have always been visible; they
> had been merged with the video track and could not be turned off.

Happy for this to be changed, though I don't quite understand how changing
the time of these sentences make a difference.

       b.)     Is our 3rd paragraph correct?
> Our third paragraph currently reads:
>  "The timing of caption text can coincide with the mouth movement of the
> speaker (where visible), but this is not
>   strictly necessary. For timing purposes, captions may sometimes
> precede or extend slightly after the audio they
>   represent. Captioning should also use adequate means to distinguish
> between speakers as turn-taking occurs during
>   conversation; this is commonly done by positioning the text near the
> speaker, although in some countries color is used
>   to indicate a change in speaker."
> Do we stand by our assertion that synchronizing captions to spoken
> dialog is not strictly necessary?

That's not what this says. It says that captions often are *slightly* off,
which is necessary to make them more readable. That doesn't mean they are
not synchronized.

Given #1 above, do we still believe
> this? Is the wider experience of caption users different from what we
> heard in NYC? Were our NYC consumers unrepresentative of caption users
> as a whole?

No, I didn't hear any difference in what was discussed in NYC to what we
wrote. What did you hear that differed from what we wrote?

Also, our NYC consumers strongly opposed colorizing caption text to
> identify speakers, and strongly opposed positioning caption text next to
> the person speaking. Again, did we draw nonrepresentative consumers at
> the NYC event? Or is our text incorrect?

It was suggested to instead label the text with the person's name. I think
that's a good option to add to our list in the above paragraph. I wouldn't
remove the existing options though.

We need to be aware that we had a very single-sided sample of users at the
NYC event: a US user representative, hard-of-hearing, and of rather advanced
age. Other countries have different representations for the text with
placement offsets - in particular the Australian Captioning Centre, that I
used to have a project with, had the requirement from users that the
placement underneath the speaker really helped and I know from Germany that
colors were used frequently there. Also, younger people may have different
preferences. The preference may have as much to do with what you are used to
as it has with what is really more usable. It might be worth asking this
particular question about preferences of other hard-of-hearing or deaf
people to get a better sample of input.

Received on Friday, 8 October 2010 19:20:09 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:55:46 UTC