RE: a11y TF CfC on resolution to support "Media Text Associations" change proposal for HTML issue 9 from Sean Hayes on 2010-04-01 (public-html-a11y@w3.org from April 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Thu, 1 Apr 2010 21:03:12 +0000
To: "public-html-a11y@w3.org" <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B911A45552D@DB3EX14MBXC301.europe.corp.microsoft.c>
I have the following issues and additions

[#1] 
I don't agree with the following section:

<quote>
This element provides a <div>-like area on top of the video above the controls or for the audio element above the audio controls, into which the text of the external resources is rendered. 

Depending on the role, the default styling of the <div>-like area will be different: 

caption, subtitle, lyrics, karaoke:
 color: white;
 background-color: #333333;
 opacity:0.8;
 text-align: center;
 bottom: 0;
 position:absolute;
textual audio descriptions:
 visibility: hidden; (unless this makes screen readers not read them out)
 aria-live: assertive;
 position: absolute;
 z-index: -100; (or more - shouldn't be visible)

</quote>
Rationale:
Any styling of caption content should match that of the video content, and not the player chrome defined by HTML. Thus such styling should be internal to the caption format (as in TTML). Initially I thought it would be appropriate to pass style down into the caption format, but I now think this would be problematic from a practical standpoint, as well as inappropriate.

Proposal: delete this section and anything that refers to CSS affecting the text track.

[#2]
Some indication in the spec that the text in a track should be available to assistive technology when visible (whether through ARIA or some intrinsic means) should be stated.

[#3]
A mechanism to associate a transcript is needed to satisfy WCAG, this proposal deals only with timed text, but transcripts are not timed and may include descriptive text as well as caption text (in order to be of use to users who are deaf-blind) , in addition a transcript would be of use in situations where the media is not fetched or played (as an alternative to the whole thing rather than an additional media track), so it may need to be in a track group with the video source. In the case where the web page is authored by the owner of the content, then HTML is capable of defining such a static transcript, so we may not need much additional markup, but we still need a means to label such markup so that assistive technology can follow the association.

Proposal: define a @transcriptFor attribute that<div> and <iframe> can use to point to a media element. Or define a new <transcript> element that can reference a <video> or <audio> element, and whose source can be inline HTML or referenced using a src attribute.

[#4]
<audio> and text overlays are problematic if the <audio> element has no intrinsic width or height. This is not strictly required by WCAG, but is probably not to be prohibited.

Proposal: Add width and height attributes to <audio> (default to zero)

[#5] If we are going to use SRT, then we should define a standard style sheet for it. This is probably best done by defining a mapping into TTML.

Proposal: I can define such a translation.

[#6] It should be made clear that the <track> element is slaved to the timeline of the currently active <source> element and that no synchronisation between track elements is implied.

Proposal change:  "The text is displayed as the parent audio or video element goes through its time interval, i.e. the parent's currentTime has reached the start time of the interval but has not yet reached the end time of the interval (a semi-open interval: [start,end) ). "

To 

"The text track is synchronised to the parent's active <source> audio or video media, tracks are not synchronised to each other. The text displayed is defined by the semantics of the referenced text format".

[#6]
Paused behaviour. The HTML spec says: " When a video element is paused and the current playback position is the first frame of video, the element represents either the frame of video corresponding to the current playback position or the poster frame, at the discretion of the user agent. "

The timeline when paused will be at 0, thus any text media defined to be displayed at that time should be displayed, but not if the poster frame is displayed.

[#7]
Need to specify the TTML default rendering area, this can be adapted from the language used for <video>, the default should be the active video pixels (not including any bars added by the UA) however we may need to allow authorial control , e.g. for cases where the text is to be rendered outside of the video rectangle. Some consideration will be needed for the <audio> case.

Sean.
Received on Thursday, 1 April 2010 21:03:54 UTC