- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Thu, 04 Sep 2008 13:22:37 +0200
- To: Dave Singer <singer@apple.com>
- Cc: public-html@w3.org, W3C WAI-XTECH <wai-xtech@w3.org>, www-style@w3.org
Dave Singer wrote: > 2.2 Associated with the media > > 2.2.1 Introduction > > There are also needs to associate data with the media, rather than embed it > within the media. The Web Content Accessibility Guidelines, for > example, request that it be possible to associate a text transcript with timed > media. Sometimes even, for very short media elements, alternative text may be > enough (e.g. "a dog barks"). > > Finally, we need to consider what should happen if source selection fails: none > of the media file sources are considered suitable for this user-agent and user. > What is the fallback in this case? It should pick the closest match available, even if not all conditions were met. > The first two following are taken from the current state of IMG tagging in HTML5 > > 2.2.2 alt > > It's probably much more rarely useful than on images, but as noted above, there > may be some small media files which are semantically significant which can be > described with a short text string (e.g. "a dog barks"), which could be placed > in an alt attribute. OK, for that use case, it seems reasonable to be able to provide a short description in some way. I'm not necessarily agreeing that it should be the alt attribute, that's just one possible solution to consider. I think we need to find and document examples of the kind of videos for which such a short alternative text would be appropriate. However, it needs to be clear that it is to be an alternative for the video, not, as Leif tried to suggest earlier in this thread, an alternative for just the poster frame. > 2.2.3 longdesc > > The longdesc attribute, when used, takes a URI as value, and links to a 'long > description'. It is probably the attribute to use to link to such things as a > transcript (though a transcript is more of a fulltext alternative than a > description). The longdesc attribute is not included for the img element. It has been clearly demonstrated in past discussions that it is a complete failure in practice and pursuing it as a solution for video is, IMO, a waste of time. Plus, I have already explained why any sort of long description, whether it be a transcript, full text alternative, or whatever else, is useful to more people than just those with accessibility needs. Any links to a long description should be done using ordinary, visible links from within the surrounding content. > 2.2.4 fallback content (video not supported vs. no source is suitable) > > As noted above, the proposal that we add to the criteria to select a source > element further highlights the open question about today's specification: the > fallback content within media elements is designed for browsers not implementing > audio/video. It is probably inappropriate to overload that use with the case > when the browser does implement media elements, but no source is appropriate. I think the right approach here is for the browser to allow the user to either save or launch the video in an external media player. > 3. In-media Selecting/Configuring > > 3.1 Introduction > > We propose considering the accessibility needs as a set of independent 'axes', > for which the user can express a clear need, and for which a media element can > express a clear ability to support, inability to support, or lack of awareness. > > The user preferences are two-state: 'I need accessibility X', 'I have no > specific need for accessibility X'. For un unstated preference 'no specific > need' is assumed. > > The tagging is however tri-state in some sense yes/no/dont-know. The media > needs to be able to be tagged: 'I can or do meet a need for accessibility X'; 'I > cannot meet a need for accessibility X'; 'I do not know about accessibility X'. > For an unstated tag, 'I do not know' is assumed. > > Clearly we can now define when a media source matches user needs. A source > *fails* to match if and only if either of the following are true; otherwise, the > source matches: > > 1. The user indicates a need for an axis, and the source is tagged as > explicitly /not/ meeting that need; > 2. The user does /not/ indicate a need, and the file is tagged as being > explicitly targetted to that need. I disagree with #2 being considered a failure. A video may contain features intended for accessibility, such as captions, but if they are closed captions, then they don't need to be turned on. If they are open captions, then it's not too much of a problem. However, at for me, a video with open captions should be given a lower priority than one without. Obviously, other people will have different priorities. > We believe that the source tagging should be done as Media Queries I don't think we should be jumping to solutions just yet. Media queries is one possibility. Another is to provide a different attribute or several attributes to indicate each axis, and there may be others to consider as well. In fact, I don't think media queries is appropriate for this at all, since it's designed for indicating features describing the target device, not user preferences. > 3.2 Method of selection > > We suggest that we add a media query, usable on the audio and video elements, > which is parameterized by a list of axes and an indication of whether the media > does, or can, meet the need expressed by that axis. The name of the query is > TBD; here we use 'accessibility'. An example might be: > > |accessibility(captions:yes, audio-description:no, epilepsy-avoidance:dont-know)| That doesn't seem to fit the syntax of media queries, where each feature is supposed to be given within parenthesis. e.g. <source ... media="screen and (min-height:240px) and (min-width:320px)"> Also, instead of providing boolean values for each property, we should be able to indicate other information about them. Captions, if available, may be open or closed, and only available in particular languages. Subtitles, if available, may be open or closed and be available in one or more languges. It's even possible to have open subtitles in one languge, yet have alternative closed subtitles shown over the top if turned on. Audio descriptions may not be available in all of the languages that the video is available in. For example, take a look at the features of the 101 Dalmations DVD in Australia. http://www.ezydvd.com.au/item.zml/797843 It has English and Dutch audio languages, but only has Audio Description available in English (listed as "English - AD"). It also has English, Dutch and Hindi subtitles, but only English captions (listed under subtitles as "English - HI", where "HI" means Hearing Impaired). Another example, English-language TV programmes are broadcast in Norway with open Norwegian subtitles. But it is also possible to turn on closed subtitles (using teletext) for some other European languages which are then rendered over the top. (I'm not sure which languages they are). Personally, I think the open subtitles are annoying, especially since most people here seem to speak English anyway, but it's what they do. > Note that the second matching rule above means that sources can be ordered in > the usual intuitive way from most specific to most general but that it also > means a source might need to be repeated. For example, if the only available > source has open captions (burned in), it could be in a single <source> element > without mentioning captions, but it is better in two <source> elements, the > first of which explicitly says that captions are supported, and the second is > general and un-tagged. This indicates to the user needing captions that their > need is consciously being met. I think we should avoid repetition of source elements pointing to the same media, and instead provide ways of accurately describing what each has available. > 3.4 Axes > > We think that the set of axes should be based on a documented set, but that > adding a new axis should be easier than producing a new revision of the > specification. IANA registration may be a way to go. > > Some of the more obvious axes include: > > 1. Captions > 2. Subtitles > 3. Audio description of video > 4. Sign language > > Notes: > > 1. The USA and Canada differentiate between captions (a replacement for > hearing the audio) and subtitles (a replacement for audio content that > is unintelligible, usually because it's in a foreign language). Other > locales do not make this distinction; nomenclature will need careful > choice if confusion is to be avoided. This is true in Australia too. According to Joe Clark, it's only the British that get the terminology wrong. http://joeclark.org/access/resources/understanding.html#Language > 2. Subtitles (in the USA and Canada sense) are not strictly an accessibility > issue, but can probably be handled here. Henri Sivonen wrote in a separate mail: > I would caution against treating subtitles (in the US/Canada sense) an > instance of the same selection mechanism engineering problem as captions (in > the US/Canada sense) just because they are the same engineering problem as far > as encoding timed text goes. > > Not hearing audio is (for practical modeling purposes) a single dimension: One > can hear, one can't hear well, one is deaf. I don't know if "can't hear well" > maps simply to "captions on" Sometimes, turning on same-language subtitles as opposed to captions is useful for people who can't hear well. For example, my dad has trouble hearing the higher frequencies and has difficulty understanding some speech because of that. (e.g. He can't hear the difference between a hard C (as in cat) and T sound very well) So he'll often turn on the English subtitles on a DVD so he can read them, but he doesn't need the extra information that the English captions provide for people who can't hear at all. I'll even do the same myself some times when I need to keep the volume down low. You make a reasonable case against using them for automatic selection purposes. However, consider the case where subtitles are provided in one language, but captions are not. A hearing impaired person is better off knowing the subtitles are available and having them turned on than not knowing. Therefore, it might be better to declare the availability of subtitles anyway. > I would guess that content providers would opt for alternative files in > this case, because additional audio tracks show up on the bandwidth bill > if served even when not needed. > ... > Language skills are multidimensional: A person whose language skills > cover a non-English native language and English already has four > dimensions: skill level in both reading and listening in both languages. > This makes automatic selection mechanism hard to engineer. Agreed. But this argues against linking to multiple videos using <source>, each with a different audio language. There are 2 options for dealing with this situation: 1. Include all alternative languages within the same video file, which increases file size and adds to the bandwidth bill. This allows manual audio selection after the video has downloaded. 2. Using individual videos, but providing manual language selection prior to loading the video. This could also be based on the choice the user made when they accessed the website, if the site itself is available in multiple langauges too. Dave Singer wrote: > 3. Sign language has a number of variants, not easily identified; not only > does American sign language differ from British, but the dialects that > form around schools that use sign language also diverge significantly. > This problem of identifying what sign language is present or desired is > exacerbated by ISO 639-2, which has only one code for sign-language > ('sgn'). The user preference for which kind of sign language is needed may > need storing, as well as their need for sign language in general. We're > hoping that the user's general language preferences can be used, for a > first pass. I've not seen many programmes use sign language. The one show that I know of that did some of the time was a childrens early morning cartoon show in Australia called Cheez TV, which sometimes had a sign language interpreter in the bottom right of the screen interpreting what the presenters were saying in the breaks between the cartoons. Although, I believe they must have used closed captions other times because they didn't always have the interpreter. We also need to consider whether or not sign language would be used for video on the web, and whether or not it's worth finding a solution to declare their availability. Also, I'm not sure how they would be implemented from a technical POV. Can they be implemented as a separate video stream using Picture-in-Picture to overlay the normal video stream, or would it need to be a complete alternative video stream? This might depend on the container format used. We would need to find and document some real world cases of online video using sign language, so we can investigate how it has been done, if at all. In fact, we really need to find evidence of all forms of accessibility features, so we can work out what is and isn't used on the web, and what we should prioritise and optimise for. For example, whether we should optimise for serving a single video file with multiple streams, or individual video files, each with a specific set of streams. The requirements for the chosen solution include the following: 1. Provide ways to indicate: * Language of open captions * Languages of available closed captions * Languages of available audio descriptions * Languages of available non-descriptive audio streams If it is also deemed appropriate to declare subtitles, then: * Language of open subtitles * Languages of available closed subtitles Any or all of those could also be either none or unknown. 2. An easy to use and understand syntax that is not too verbose. 3. Have reasonable default values. 4. Possibly be extensible to allow for other axes to be defined and expressed in the future. 5. Avoid unnecessary repetition 6. Support multiple tracks per video file, or multiple videos, each with a specific set of streams. This could be done with attributes. For example: <video ... captions="open:en; closed:fr,de" subtitles="closed:nl" audiolang="en,fr,de"> audiodesc="en" Or perhaps a single accessibility attribute: <video access="(captions=open:en;closed:fr,de) and (subtitles=closed:nl) and (audiolang=en,fr,de) and (audiodesc=en)"> The syntax of both of those might be a little complex though, and I would prefer to simplify them if possible. One issue is that while this does correctly distinguish between captions and subtitles, educating authors to use them correctly rather than interchangeably may be a problem, especially given that they incorrectly use the term subtitles for both in the UK. Another problem to consider with automaitic selection mechanisms is that, AIUI, common video container formats don't provide a way to programmaticly distinguish between subtitle tracks and caption tracks, since both are just text tracks. I think they just provide the ability to declare the language of the track, and some also provide the ability to include human readable descriptions. Text tracks can also be used for other information besides subtitles and captions. For example, I've seen DVDs provide commentary using a text track without an accompanying audio track. Note that I didn't use the lang or xml:lang attributes to express the language of the audio streams because it's limited to declaring a single language. However, in the absense of an explicit audio language declaration, then assuming it's the same as the element's language is a reasonable default. -- Lachlan Hunt - Opera Software http://lachy.id.au/ http://www.opera.com/
Received on Thursday, 4 September 2008 11:23:29 UTC