Categorization of media a11y requirements from Frank Olivier on 2010-11-04 (public-html-a11y@w3.org from November 2010)

From: Frank Olivier <Frank.Olivier@microsoft.com>
Date: Thu, 4 Nov 2010 16:55:32 +0000
To: "public-html@w3.org" <public-html@w3.org>, "HTML Accessibility Task Force (public-html-a11y@w3.org)" <public-html-a11y@w3.org>
Message-ID: <A1646003DDF2EE4A8BC51B34FBD2AD5307EB164E@TK5EX14MBXC228.redmond.corp.microsoft.>
Re http://www.w3.org/2010/11/04-html-wg-minutes.html 

Members of HTML WG and media a11y reviewed the http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements requirements

We sorted the requirements into the following categories:
*	UX: User agent user experience requirement
*	SPECNEW: New requirements for the HTML5 specification
*	SPECCED: Already in the HTML5 specification
*	TRACK: Text track format detail - something that should be specced in the track format, not the HTML spec
*	NO: Not an issue we will address in the W3C


Of the 119 requirements in the document: 
SPECNEW	11 items (9% of total)
SPECCED	21 (18%)
TRACK		24 (20%)
NO		4 (3%)
UX		73 (61%)

Detailed list:
NO: (DV-9) Allow the author to use a codec which is optimized for voice only, rather than requiring the same codec as the original soundtrack.Does not seem like a UA issue:
NO: (KA-3) The author would be able to choose any/all of the controls, skin them and position them. Needs discussion' kill 
NO: (KA-5) The scripted and native controls must go through the same platform-level accessibility framework (where it exists), so that a user presented with the scripted version is not shut out from some expected behavior.
NO: (PP-1) This necessitates a clear and unambiguous declared format, so that existing authoring tools can be configured to export finished files in the required format.
SPECCED, UX: (CA-2) Support the synchronisation of multitrack audio either within the same file or from separate files - preferably both.
SPECCED: (API-1) The existence of alternative-content tracks for a media resource must be exposed to the user agent.
SPECCED: (API-2) Since authors will need access to the alternative content tracks, the structure needs to be exposed to authors as well, which requires a dynamic interface.
SPECCED: (API-3) Accessibility APIs need to gain access to alternative content tracks no matter whether those content tracks come from within a resource or are combined through markup on the page.
SPECCED: (CA-1) Support clear audio as a separate, alternative audio track from other audio-based alternative media resources.
SPECCED: (CC-22) Support captions that are provided inside media resources as tracks, or in external files.
SPECCED: (CC-26) Support multiple tracks of foreign-language subtitles in different languages.
SPECCED: (CC-27) Support live-captioning functionality. Addressed via API
SPECCED: (CN-4) Support third-party provided structural navigation markup. 
SPECCED: (CNS-4) Producers and authors may optionally provide additional access options to identified structures, such as direct access to any node in a table of contents. [May be done with cue range]
SPECCED: (DAC-4) Synchronized alternatives for time-based media (e.g., captions, descriptions, sign language) can be rendered at the same time as their associated audio tracks and visual tracks (UAAG 2.0 3.1.3).
SPECCED: (DV-3) Support multiple description tracks (e.g., discrete tracks containing different levels of detail).
SPECCED: (DV-4) Support recordings of real human speech as a track of the media resource, or as an external file.
SPECCED: (KA-3) All functionality available to native controls must also be available to scripted controls.
SPECCED: (PP-1) Support existing production practice for alternative content resources, in particular allow for the association of separate alternative content resources to media resources. Browsers cannot support all forms of time-stamp formats out there, just as they cannot support all forms of image formats (etc.). 
SPECCED: (PP-4) Typically, alternative content resources are created by different entities to the ones that create the media content. They may even be in different countries and not be allowed to re-publish the other one's content. It is important to be able to host these resources separately, associate them together through the Web page author, and eventually play them back synchronously to the user.
SPECCED: (SL-4) Support multiple sign-language tracks in several sign languages. 
SPECCED: (T-1) Support the provisioning of a full text transcript for the media asset in a separate but linked resource. where the linkage is programmatically accessible to AT.
SPECNEW, SPECCED: (SL-1) Support sign-language video either as a track as part of a media resource or as an external file.
SPECNEW, SPECCED: (SL-2) Support the synchronized playback of the sign-language video with the media resource. 
SPECNEW, TRACK: (CC-5) Support positioning in all parts of the screen - either inside the media viewport but also possibly in a determined space next to the media viewport. This is particularly important when multiple captions are on screen at the same time and relate to different speakers, or when in-picture text is avoided.
SPECNEW, TRACK: (CN-1) Provide a means to structure media resources so that users can navigate them by semantic content structure. 
SPECNEW, UX: (CC-25) Support edited and verbatim captions, if available. 
SPECNEW, UX: (DV-8) Allow the author to provide fade and pan controls to be accurately synchronized with the original soundtrack.
SPECNEW: (CN-3) Support both global navigation by the larger structural elements of a media work, and also the most localized atomic structures of that work, even though authors may not have marked-up all levels of navigational granularity.
SPECNEW: (CN-6) Support direct access to any structural element, possibly through URIs. [Media fragment-like issue]
SPECNEW: (CNS-1) All identified structures, including ancillary content as defined in "Content Navigation" above, must be accessible with the use of "next" and "previous," as refined by the granularity control. [May be handled with cue ranges]
SPECNEW: (DAC-2) The user has a global option to specify which types of alternative content by default and, in cases where the alternative content has different dimensions than the original content, how the layout/reflow of the document should be handled. (UAAG 2.0 3.1.2). [Probably minimal spec text required: Media queries would work nicely here; also UX issue (user sets media query to match)]
SPECNEW: (DAC-5) Non-synchronized alternatives (e.g., short text alternatives, long descriptions) can be rendered as replacements for the original rendered content (UAAG 2.0 3.1.3).
TRACK, UX: (CC-16) Use conventions that include inserting left-to-right and right-to-left segments within a vertical run (e.g. Tate-chu-yoko in Japanese), when rendered as text in a top-to-bottom oriented language.
TRACK, UX: (CC-19) Present the full range of typographical glyphs, layout and punctuation marks normally associated with the natural language's print-writing system.
TRACK, UX: (CC-21) Permit the distinction between different speakers. 
TRACK, UX: (ECC-2) Support hyperlinks and other activation mechanisms for supplementary data for (sections of) caption text.
TRACK, UX: (ECC-3) Support text cues that may be longer than the time available until the next text cue and thus provide overlapping text cues.
TRACK, UX: (ECC-4) It needs to be possible to define timed text cues that are allowed to overlap with each other in time and be present on screen at the same time 
TRACK: (CC-10) Render a background in a range of colors, supporting a full range of opacities.
TRACK: (CC-11) Render text in a range of colors. 
TRACK: (CC-14) Allow the use of mixed display styles-- e.g., mixing paint-on captions with pop-on captions-- within a single caption cue or in the caption stream as a whole. 
TRACK: (CC-17.1) Represent content of different natural languages. In some cases the inclusion of a few foreign words form part of the original soundtrack, and thus need to be in the same caption resource. 
TRACK: (CC-18) Represent content of at least those specific natural languages that may be represented with [Unicode 3.2], including common typographical conventions of that language (e.g., through the use of furigana and other forms of ruby text).
TRACK: (CC-2) Allow the author to specify erasures, i.e., times when no text is displayed on the screen (no text cues are active).
TRACK: (CC-20) Permit in-line mark-up for foreign words or phrases. 
TRACK: (CC-3) Allow the author to assign timestamps so that one caption/subtitle follows another, with no perceivable gap in between.
TRACK: (CC-4) Be available in a text encoding. 
TRACK: (CC-8) Allow the author to specify line breaks. 
TRACK: (CC-9) Permit a range of font faces and sizes. 
TRACK: (CN-2) The navigation track should provide for hierarchical structures with titles for the sections.
TRACK: (DV-14) Support metadata, such as copyright information, usage rights, language, etc.
TRACK: (ECC-1) Support metadata markup for (sections of) timed text cues. 
TRACK: (PP-2) Support the association of authoring and rights metadata with alternative content resources, including copyright and usage information. [Move to ATAG?]
TRACK: (PP-3) Support the simple replacement of alternative content resources even after publishing. 
UX, SPECCED: (MD-5) If the user can modify the state or value of a piece of content through the user interface (e.g., by checking a box or editing a text area), the same degree of write access is available programmatically (UAAG 2.0 2.1.5).
UX: (CA-3) Support separate volume control of the different audio tracks. 
UX: (CC-1) Render text in a time-synchronized manner, using the media resource as the timebase master.
UX: (CC-12) Enable rendering of text with a thicker outline or a drop shadow to allow for better contrast with the background.
UX: (CC-13) Where a background is used, it is preferable to keep the caption background visible even in times where no text is displayed, such that it minimises distraction. However, where captions are infrequent the background should be allowed to disappear to enable the user to see as much of the underlying video as possible.
UX: (CC-15) Support positioning such that the lowest line of captions appears at least 1/12 of the total screen height above the bottom of the screen, when rendered as text in a right-to-left or left-to-right language
UX: (CC-17.2) Also allow for separate caption files for different languages and on-the-fly switching between them. This is also a requirement for subtitles.
UX: (CC-23) Ascertain that captions are displayed in sync with the media resource. 
UX: (CC-24) Support user activation/deactivation of caption tracks.
UX: (CC-6) Support the display of multiple regions of text simultaneously. 
UX: (CC-7) Display multiple rows of text when rendered as text in a right-to-left or left-to-right language.
UX: (CN-10) Support that in bilingual texts both the original and translated texts can appear on screen, with both the original and translated text highlighted, line by line, in sync with the audio narration.
UX: (CN-5) Keep all content representations in sync, so that moving to any particular structural element in media content also moves to the corresponding point in all provided alternate media representations (captions, described video, transcripts, etc) associated with that work.
UX: (CN-7) Support pausing primary content traversal to provide access to such ancillary content in line.
UX: (CN-8) Support skipping of ancillary content in order to not interrupt content flow.
UX: (CN-9) Support access to each ancillary content item, including with "next" and "previous" controls, apart from accessing the primary content of the title.
UX: (CNS-2) Users must be able to discover, skip, play-in-line, or directly access ancillary content structures.
UX: (CNS-3) Users need to be able to access the granularity control using any input mode, e.g. keyboard, speech, pointer, etc.
UX: (DAC-1) The user has the ability to have indicators rendered along with rendered elements that have alternative content (e.g., visual icons rendered in proximity of content which has short text alternatives, long descriptions, or captions). In cases where the alternative content has different dimensions than the original content, the user has the option to specify how the layout/reflow of the document should be handled. (UAAG 2.0 3.1.1).
UX: (DAC-3) The user can browse the alternatives and switch between them. 
UX: (DAC-6) Provide the user with the global option to configure a cascade of types of alternatives to render by default, in case a preferred alternative content type is unavailable.
UX: (DAC-7) During time-based media playback, the user can determine which tracks are available and select or deselect tracks. These selections may override global default settings for captions, descriptions, etc. (UAAG 2.0 4.9.8)
UX: (DAC-8) Provide the user with the option to load time-based media content such that the first frame is displayed (if video), but the content is not played until explicit user request. 
UX: (DV-1) Provide an indication that descriptions are available, and are active/non-active.
UX: (DV-10) Allow the user to select from among different languages of descriptions, if available, even if they are different from the language of the main soundtrack.
UX: (DV-11) Support the simultaneous playback of both the described and non-described audio tracks so that one may be directed at separate outputs (e.g., a speaker and headphones).
UX: (DV-12) Provide a means to prevent descriptions from carrying over from one program or channel when the user switches to a different program or channel.
UX: (DV-13) Allow the user to relocate the description track within the audio field, with the user setting overriding the author setting. The setting should be re-adjustable as the media plays.
UX: (DV-2) Render descriptions in a time-synchronized manner, using the media resource as the timebase master. 
UX: (DV-6) Allow the user to independently adjust the volumes of the audio description and original soundtracks, with the user's settings overriding the author's.
UX: (DV-7) Permit smooth changes in volume rather than stepped changes. The degree and speed of volume change should be under provider control. 
UX: (ECC-5) Allow users to define the reading speed and thus define how long each text cue requires, and whether media playback needs to pause sometimes to let them catch up on their reading.
UX: (EVD-1) Support detailed user control as specified in (TVD-4) for extended video descriptions.
UX: (EVD-2) Support automatically pausing the video and main audio tracks in order to play a lengthy description.
UX: (EVD-3) Support resuming playback of video and main audio tracks when the description is finished.
UX: (KA-1) Support operation of all functionality via the keyboard on systems where a keyboard is (or can be) present (Needs better text), and where a unique focus object is employed. This does not forbid and should not discourage providing mouse input or other input methods in addition to keyboard operation. (UAAG 2.0 4.1.1)
UX: (KA-2) Support a rich set of native controls for media operation, including but not limited to play, pause, stop, jump to beginning, jump to end, scale player size 
UX: (KA-4) It must always be possible to enable native controls regardless of the author preference to guarantee that such functionality is available 
UX: (MD-2) Ensure accessibility of all user-interface components including the user interface, rendered content, and alternative content; make available the name, role, state, value, and description via a platform-accessibility architecture. (UAAG 2.0 2.1.2)
UX: (MD-3) If a feature is not supported by the accessibility architecture(s), provide an equivalent feature that does support the accessibility architecture(s). Document the equivalent feature in the conformance claim. (UAAG 2.0 2.1.3)
UX: (MD-4) If the user agent implements one or more DOMs, they must be made programmatically available to assistive technologies. (UAAG 2.0 2.1.4) This assumes the video element will write to the DOM.
UX: (MD-6) If any of the following properties are supported by the accessibility-platform architecture, make the properties available to the accessibility-platform architecture 
UX: (MD-7) Ensure that programmatic exchanges between APIs proceed at a rate such that users do not perceive a delay. (UAAG 2.0 2.1.7).
UX: (SL-3) Support the display of sign-language video either as picture-in-picture or alpha-blended overlay, as parallel video, or as the main video with the original video as picture-in-picture or alpha-blended overlay. 
UX: (SL-5) Support the interactive activation/deactivation of a sign-language track by the user.
UX: (T-2) Support the provisioning of both scrolling and static display of a full text transcript with the media resource, e.g. in a area next to the video or underneath the video, which is also AT accessible.
UX: (TSM-1) The user can adjust the playback rate of the time-based media tracks to between 50% and 250% of real time.
UX: (TSM-2) Speech whose playback rate has been adjusted by the user maintains pitch in order to limit degradation of the speech quality.
UX: (TSM-3) All provided alternative media tracks remain synchronized across this required range of playback rates.
UX: (TSM-4) The user agent provides a function that resets the playback rate to normal (100%).
UX: (TSM-5) The user can stop, pause, and resume rendered audio and animation content (including video and animated images) that last three or more seconds at their default playback rate.
UX: (TVD-1) Support presentation of text video descriptions through a screen reader or braille device, with playback speed control and voice control and synchronisation points with the video.
UX: (TVD-2) TVDs need to be provided in a format that contains the following information: (A) start time, text per description cue (the duration is determined dynamically, though an end time could provide a cut point)
UX: (TVD-3) Where possible, provide a text or separate audio track privately to those that need it in a mixed-viewing situation, e.g., through headphones.
UX: (TVD-4) Where possible, provide options for authors and users to deal with the overflow case: continue reading, stop reading, and pause the video. (One solution from a user's point of view may be to pause the video and finish reading the TVD, for example.) User preference should override authored option.
UX: (TVD-5) Support the control over speech-synthesis playback speed, volume and voice, and provide synchronisation points with the video.
UX: (VP-1) It must be possible to deal with three different cases for the relation between the viewport size, the position of media and of alternative content: 
UX: (VP-2) The user can change the following characteristics of visually rendered text content, overriding those specified by the author or user-agent defaults 
UX: (VP-3) Provide the user with the ability to adjust the size of the time-based media up to the full height or width of the containing viewport, with the ability to preserve aspect ratio and to adjust the size of the playback viewport to avoid cropping, within the scaling limitations imposed by the media itself. 
UX: (VP-4) Provide the user with the ability to control the contrast and brightness of the content within the playback viewport. 
UX: (VP-5) Captions and subtitles traditionally occupy the lower third of the video, where also controls are also usually rendered.
UX: [In that this is a user agent issue] (MD-1) Support a platform-accessibility architecture relevant to the operating environment. (UAAG 2.0 2.1.1)
UX: (CA-4) Support pre-emphasis filters, pitch-shifting, and other audio-processing algorithms.
UX: (DV-5) Allow the author to independently adjust the volumes of the audio description and original soundtracks. [Actually a requirement on the media format]
Received on Thursday, 4 November 2010 16:56:10 UTC