W3C home > Mailing lists > Public > public-html-a11y@w3.org > June 2010

RE: [media] Addressing "Captioning" feedback on requirements document

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Mon, 28 Jun 2010 11:18:22 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B911A58273B@DB3EX14MBXC313.europe.corp.microsoft.com>
My comments.

1. Agree (although I would rather say "video without audio but with a text track is possible", as captions are a replacement for audio, this is a recurring theme we need to address once in a glossary.)

2. CC-5 is for positioning of regions of text, e.g. to disambiguate multiple speakers or to avoid some information in the underlying media. Therefore the min requirement is a bounding box (with an optional background) into which text is flowed, and that probably needs to be pixel aligned. The absolute position of text within the region is less critical, although it is important to be able to avoid bad word-breaks and have adequate white space around letters and so on.

CC-2 erasures means periods when there is no overlay information (no text, and no text background)

3. Agree.

4. CC-14. Paint-on can be used to change text within an existing caption which is pop-on. Some examples would be a good idea tho.

5. CC-17  Multiple files might be used in the case where complete alternative captions for hearing and subtitles for language need to be used simultaneously (common in Europe and Asia). It would be possible to include these in a single file, but that makes the maintenance of those resources much harder. In some cases the inclusion of a few foreign words form part of the original soundtrack, and thus need to be in the same caption resource. 

6. CC-20. Italics may sufficient for a human, but it is important to be able to mark up languages so that the text can be rendered correctly, since the same Unicode can be shared between languages and rendered differently in different contexts. This is mainly an I18n issue. It is also important for audio rendering, to get pronunciation correct.

7. CC-26 agree.

8. Agree, but it would be good to have a note somewhere explaining the differences between strict captioning, and more general text overlays.

9. Agree.

11. I think simultaneous presentation is implied by CC17. And necessary (see above).

12. Correct. One could think of them in the same way as foreign language translations.

19. See note above. We should include a glossary.

20. Make the role of timebase generic to the media (indeed in MPEG the time base is not strictly part of the audio or the video but a separate entity). Include distinction between caption and other forms of text in glossary

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: Wednesday, June 23, 2010 2:53 PM
To: Sean Hayes
Cc: HTML Accessibility Task Force
Subject: [media] Addressing "Captioning" feedback on requirements document

Hi Sean,

I'm going to start our work item by going through the feedback that was provided on the captioning section. I'm making suggestions for changes to make in the wiki. Thus, the notes below are in preparation for your input.

Feedback was provided at:
for the spec at:

(CC-1) The reference time doesn't have to be the audio track, let's just say it's the media resource. (Video without audio but with captions is possible.)

Suggestion: accept proposal
Change: s/audio track/media resource/

(CC-5) Is this a requirement for pixel-perfect positioning, or relative positioning? Must it be possible to give a bounding box for the text, or is it enough to say where it starts?

Suggestion: ?? don't understand what erasures means actually ??

(CC-13) Sounds very peculiar and not something that is possible with most formats I have seen. Please give a rationale for why this is important (or remove the requirement if it isn't).

Suggestion: Reformulate to "Enable rendering of text with a thicker outline or a drop shadow to allow for better contrast with the background."

(CC-14) How can a single caption mix "display styles", e.g. both be "paint-on" and "pop-up"? (I don't know exactly what any of the quoted words mean.)

Suggestion: explain the display styles & how several can be active at once

(CC-17) If there are separate caption files, is it expected that these should be displayed together and not overlap? This sounds rather difficult to implement, why not simply have a single caption file?

Suggestion: Parallel display should indeed be possible. Add a sentence explaining.

(CC-20) Is supporting italics enough to differentiate between languages, or should it be possible to mark up the actual language. If yes, why?

Reply: Foreign languages should be exposed specially, possibly in italics or bold etc. but not to mark up the actual language.

(CC-26) Sounds like a bonus, not an essential requirement.

Reply: Internationalisation of captions is a core requirement, otherwise we remove a large number of users from access to the content.

The first paragraph of the introductory text says: "Captions are always written in the same language as the main audio track."
Whereas the Requirements sections says: "Formats for captions, subtitles or foreign-language subtitles must:"

Suggestion: remove the first sentence from the introduction.

Obviously the Requirements section knows about subtitles in other languages than the languag of the main audio track ... Please bring in the subject of subtitling in other languages into the introductory text.

Suggestion: agree - introduce a sentence about subtitles in other languages in introduction

(CC-1) Not all media files have audio.

see 1. above

(CC-17) Does this mean that captions are rendered from more than one external caption file simultaneously?

Reply: rendering is not mentioned in CC-17, just representation. This could be in a menu from which the captions are selected.

(CC-25) What must a UA do differently for edited vs full verbatim captions?

Reply: there is a need to potentially have two different caption tracks available: edited and full verbatim captions. That's what this refers to.

(CC-1) The timebase master should be the media resource, instead of the audio track?

see 1. above

(CC-1) What if said media has no audio track?

see 1. above

(CC-17) How would multiple caption files coexist for the same media?
Based on user preferences?

Reply: the point is that they can be made available on the server, marked up on the HTML page, and downloaded as required for the media resource.

The user should also have final control over rendering styles like color and fonts

Reply: agree - maybe add another requirement

There should be a way to find and switch caption languages on the fly.
How is that done if the caption document is found at a different URI?

Reply: the URIs need to be all made available to the HTML document

(cc-5) Do you really want to say ALL parts of the screen? Placing captions to the far right of the screen may not be the best thing to allow.

Reply: any position should be possible. There may be a situation, where it will be required.

We should be careful not to conflate the terminology between captions and subtitles as some people get upset about that.

Reply: agreed - however, there are also HoH people of different languages, so captions need to be internationalisable, too

(CC-1) A media file may have a separate time encoding which is used both video and audio. However captions are defined as a text representation of audio; so captions and video only don't make sense.

Reply: There can be videos with multiple audio tracks, so identifying one as the main time-keeper is dodgy. Also, a captioned video does not have to retain the audio track to be useful. It is better to simply refer to the time-keeping of the complete media resource, see also 1.

CC9-12 should be clear that the effects must be mixable within one caption.

Suggestion: accept - clarify that the styling mentioned in cc 9-12 should be applicable to a single caption cue.

CC-13 is to allow the user to see as much of the underlying video as possible where captions are infrequent. Where captions are frequent; it is preferable to keep the caption background so that it minimises distraction.

Suggestion: add to text proposed in 3.

CC-17 is really a requirement on subtitles (foreign language).

Reply: true, but it also applies to captions for people that do not speak the main caption language (see also 15).


When we come to an agreement as to what is to be done about each of the inputs, I can go ahead and make the changes in the wiki.

So much for today on captions - let's do Extended Captioning, Keyboard Access to Interactive Controls, and Requirements on the use of the viewport tomorrow?

Received on Monday, 28 June 2010 11:19:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 04:42:12 GMT