Media TextAssociations
From HTML accessibility task force Wiki
Contents |
A SMIL-Based Declarative Syntax for Associating Synchronized Text to Media Elements
Summary
This is a proposal to extend the HTML5 declarative markup for media elements with markup to reference both embedded and external time-synchronous text resources. This proposal provides an extensible mechanism to control the activation of timed text content within a document. The nature of the text content and its use within the document is beyond the scope of this proposal. The proposal is based on an adaptation of existing W3C technology used in SMIL, and integrated into Daisy Talking Books.
The text associations are defined in a manner that the markup can be used for supporting additional media types or other selection criteria in a future version of HTML. Only the framework for such extensions are considered for HTML5.
Status
This version was created after discussions at the W3C HTML5 Accessibility Task Force face-to-face meeting in Birmingham, England.
Related Bugs
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5758
Rationale
WCAG 2.0 recommends a large number of alternative representations for audio-visual content for accessibility purposes. Amongst them is synchronized text, which is text that transcribes/describes what is being said or is happening in the audio-visual resource. Examples of synchronized text content are captions (as alternative for the audio track) and textual audio descriptions (as alternative for the video track, read out by a screen reader or transferred to a braille device).
Currently, HTML5 has no declarative means of associating synchronized text with a media element. Although more general solutions for composing and synchronizing alternative content beyond text would be useful extensions to meet broader accessibility needs, the solution presented in this proposal focuses only on text associations. The mechanism proposed could also be used for other content selection issues, but these are not directly considered.
Related Proposals
This proposal tries to bring all these proposals together.
Proposal
The Markup
The textstream element:
interface HTMLTextstreamElement : HTMLElement { attribute DOMString src; attribute DOMString name; attribute DOMString role; attribute DOMString type; attribute DOMString media; attribute DOMString language; attribute DOMString systemTest; };
The <textstream> element allows authors to specify a text resource for media elements. The object referenced by the textstream element in this proposal can be used to define a set of captions or subtitles that are related to the active media resource from within which this textstream is referenced. The external text resource is expected to consist of a sequence of timed text. The text encoding may provide additional layout, styling, and animation information within the text encoding.
The text is displayed as the parent audio or video element goes through its time interval. The parent audio or video element provides a master timeline for scheduling text information. The audio or video element also servers as the synchronization master when temporally composing the textstream data with the associated media. Any time manipulations in the parent media -- including start/stop/pause behavior and timeline navigation -- will be reflected in the active point of the captions.
When used as a direct or indirect child of an audio or video element, the content associated with the textstream is rendered within the spatial region on top of the video above the controls or for the audio element above the audio controls, into which the text of the external resources is rendered. Any scaling operations on the video content are expected to apply to the text areas as well.
Attribute descriptions
The @src attribute gives the address of the text resource to associate. The attribute, if present, must contain a valid URI. The source attribute may reference an external text object or an embedded text track within an audio/video object. If this attribute is missing, an error should be generated.
The @name attributes allows the author to provide a short, descriptive name for the textstream, which can be used as an identifier and to represent the text in a menu. Since UA selection control may be implicit or explicit, this attribute is recommended but not required.
The @role attribute description of the content that a textstream offers to the media resource. It may be used to associate styling information with the content of the textstream. It is assumed that the styling preferences for use with the @role attribute are defined by an appropriate stylesheet mechanism, either outside the element definition or within the textstream data. The available roles are beyond the scope of this proposal. This attribute is optional.
The @type attribute describes the text resource in terms of its MIME type, optionally with a charset parameter. The UA definition may specify support for specific formats. The @type attribute may be used as a basis for content selection. The attribute is optional.
The @media attribute provides a valid media query. The @media attribute may be used as a basis for content selection. A media query that evaluates to "false" means the textstream cannot be enabled because it is not appropriate for the user's environment. This attribute is optional.
The @language attribute, if present, gives the natural language of the linked resource. The value must be a valid RFC 3066 language code. [RFC3066] This code may be used as the basis for selecting a caption for display.
One more @systemTest attributes that may be used to specify the conditions under which the associated element will be enabled for presentation. The UA must resolve the current state of the user interface to determine if conditional activation is to take place. Textstream elements within a @systemTest attribute always are candidates for inclusion. This attribute is optional.
The switch element:
interface HTMLSwitch : HTMLElement { attribute DOMString systemTest; };
The <switch> element is used as an optional element to group several <textstream> elements together. The alternatives of the switch are processed in lexical order. For each child of the switch, the value of one or more systemTest attributes will determine if the associated element evaluates to enabled. Only elements on which all systemAttributes evaluate to TRUE, or elements without a systemTest attribute are enabled. It is possible that none of the elements within a switch are enabled.
The <switch> element may be nested. The <switch> element may be a child of an <audio> or <video> element, or a <source> element.
The <switch> may also be supported outside of HTML5 media objects, but this is beyond the scope of this proposal.
Attribute descriptions
One more @systemTest attributes may be placed directly on the <switch> element. If present, this/these attribute(s) will determine whether or not the switch element itself is evaluated. This attribute is optional.
Recommended user interface
- The value of system test attributes and other selection criteria (such as the language attribute) are typically defined as global UA preferences. It is recommended that UAs add an icon to the controls bar of the video or audio element to indicate the existence of associated text and to provide the possibility to override the selection criteria for the associated media object. UA may display the available text associations through a menu where the resources are displayed.
Resource selection algorithm
The following algorithm determines whether the content associated with a particular <textstream> element is activated:
- If the textstream element is placed outside of a direct parent switch element, any systemTest attributes are evaluated and if these evaluate to TRUE, the resource is considered 'selectable'. The UA may then perform additional an additional evaluation to determine if the selectable candidate is of a type and media encoding that are compatible with the capabilities of the current UA instance. If all of these evaluate to TRUE, then the content associated with the element is displayed under the temporal and spatial constraints of the enclosing media element.
- If the textstream element is placed as a direct child of a switch element, then only the first of the direct children that meet all of the requirements stated for textstream element activation is rendered. Once a valid candidate is found, further processing of the switch children is stopped. Any systemTest attribute(s) placed on the switch element will be evaluated before the processing of the switch children begins. If present, the switch will only be processed if all such attributes evaluate to TRUE.
The resolution of test setting -- either as systemTest attributes, or the effective values of the @language attribute -- may occur statically or dynamically. A static evaluation is performed at the point at which the enclosing video/audio element is activated. A dynamic evaluation occurs during the active duration of the audio/video element, either as the consequence of manipulating media object controls or via script manipulation. If dynamic activation is supported, the associated text content should be seeked to correspond to the temporal moment of the associated media object at the time of selection.
File Formats
The text formats to be supported for use with the <textstream> element are beyond the scope of this proposal. The selection functionality is independent of any encoding format.
Examples
The basic atom of selection control for text associations is the SystemTest attribute. For purposes of constructing the examples in this section, we assume that the following SystemTest attributes are supported:
- systemCaptions: Logical value. Evaluates to TRUE if the UA has determined that the user wishes to have captions displayed.
In addition, the existing language, type and media attributes are used as effective system test predicates. Note that the specification of all available test variables is orthogonal to this proposal.
Example 1: Simple conditional inclusion of a default captions file.
... <video src="movie.mp4" controls ... > <textstream src="movie.srt" systemCaptions="true" ... /> </video> ...
In this example, which will probably cover the majority of initial use of captions, a captions file is identified as a parallel component to the video object. Note that the system test attribute is a strict boolean: the value TRUE is compared with the current user preference setting to determine if captions are requested. In principle, this setting could be dynamically evaluated.
Example 2: Simple conditional inclusion of two captions files.
... <video src="movie.mp4" controls ... > <textstream src="movie-en.srt" systemCaptions="true" language="en" ... /> <textstream src="movie-nl.srt" systemCaptions="true" language="nl" ... /> </video> ...
In this example, one of four situations will occur: If systemCaptions evaluates to false, no captions will be shown. If systemCaptions evaluates to true (that is, the UA determines that captions are desired), and if the language preference includes English and Dutch, both will be shown. If it includes neither English or Dutch, then no captions will be rendered. If only one of these languages is desired, only that language will be supported. The set of choices is not limited.
Example 3: Inclusion of one of a mutually-exclusive captions files, with a default
... <video src="movie.mp4" controls ... > <switch systemCaptions="true"> <textstream src="movie-en.srt" language="en" ... /> <textstream src="movie-nl.srt" language="nl" ... /> </switch> </video> ...
In this example, the entire switch statement is only included if captions have been requested. (This is identical to duplicating the systemCaptions test variable on each of the textstream elements.) Within the switch, either English or Dutch will be selected, but not both. If the UA determines that neither of these languages are desired, no captions are presented.
Example 4: Conditional inclusion of an embedded captions track within a selected encoding.
... <video src="movie.mp4" controls ... > <switch systemCaptions="true"> <textstream src="movie-en.srt" language="en" ... /> <textstream src="movie-fr.srt" language="fr" ... /> <textstream src="movie-nl.srt" ... /> </switch> </video> ...
This example is similar to #3, with the exception that if captions ARE requested, Dutch captions will be provided unless English or French have been explicitly preferred.
Example 5: Conditional inclusion of an embedded captions track within a selected encoding using @type processing.
... <video src="movie.mp4" controls ... > <switch systemCaptions="true"> <textstream src="movie.smilText" type="test/smilText" ... /> <textstream src="movie.dfxp" type="test/dfxp" ... /> <textstream src="movie.srt" ... /> </switch> </video> ...
This example is similar to #4, with the exception that if captions ARE requested, captions encoded in smilText will be preferred to those other formats. If neither smilText nor DFXP captions are available, SRT captions will be used.
Example 6: Conditional inclusion of an embedded text file.
... <video controls ... > <source type="..." src="movie.mp4" > <textstream src="movie[track-2].mp4" systemCaptions="true" ... /> </source> <source type="..." src="movie.xyz" /> </video> ...
This example shows how an embedded captions track could be conditionally activated. The track syntax is simplified for clarity. It is assume that the encoding movie.xyz does not have captions.
Example 7: Mixing embedded and external captions.
... <video controls ... > <source type="..." src="movie.mp4" > <switch systemCaptions="true"> <textstream src="movie[track3].mp4" language="en" ... /> <textstream src="movie[track4].mp4" language="fr" ... /> <textstream src="movie[track2].mp4" ... /> </switch> </source> <source type="..." src="movie.xyz" > <switch systemCaptions="true"> <textstream src="movie-en.srt" language="en" ... /> <textstream src="movie-fr.srt" language="fr" ... /> <textstream src="movie-nl.srt" ... /> </switch> </source> </video& ...
This example is similar to #6, with the exception that one of several embedded captions tracks is activated if the mp4 encoding is used, while one of several external files is selected if encoding xyz is used.
In general, the pre-existence of the source element poses some syntax and semantics problems for general content control processing. The evaluation algorithm for processing the children of the <video> and <audio> elements would be considerably cleaner if the following notation was adopted:
... <video src="movie.mp4" controls ... > <switch> <source type="..." src="movie.abcd" ... /> <source type="..." src="movie.bcad" ... /> <source type="..." src="movie.dcab" ... /> </switch> <switch systemCaptions="true"> <textstream src="movie-en.srt" language="en" ... /> <textstream src="movie-fr.srt" language="fr" ... /> <textstream src="movie-nl.srt" ... /> </switch> </video> ...
Here, the switch controlling source elements would be processed first, followed by processing of captions. It would also be possible to place source and textstream elements outside of the switches. Embedded captions placed within source elements (either as simplex statements or via an embedded switch) would need special processing in the captions section (perhaps systemCaptionsEnable="false") to ensure that two sets of captions are not presented.
Impact
Positive Impact
- This proposal provides support for conditionally selecting one or more timed text objects to be presented with video or audio media. The encoding is also applicable to other forms of selection, such as based on screen size or connection speed. The proposal may also be used outside the direct content of a media object.
- The proposal is based on existing W3C technology. It has also been integrated successfully in other accessibility applications, such as Daisy Talking Books.
- The proposal separates content selection and content formatting issues.
Negative Impact
- The main disadvantage of the proposal is that the current architecture of the media containers in HTML5 preclude a more structured solution to support associated content or content alternatives. (For example, on low bandwidth connection, it may be useful to substitute slideshows for video, or text summaries for audio. These alternatives require a richer composition language than the constrained environment considered in this proposal.
Categories: Media | HTML5 | Accessibility | Bugs | ChangeProposal