- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Thu, 04 Sep 2008 13:22:37 +0200
- To: Dave Singer <singer@apple.com>
- Cc: public-html@w3.org, W3C WAI-XTECH <wai-xtech@w3.org>, www-style@w3.org
Dave Singer wrote:
> 2.2 Associated with the media
>
> 2.2.1 Introduction
>
> There are also needs to associate data with the media, rather than embed it
> within the media. The Web Content Accessibility Guidelines, for
> example, request that it be possible to associate a text transcript with timed
> media. Sometimes even, for very short media elements, alternative text may be
> enough (e.g. "a dog barks").
>
> Finally, we need to consider what should happen if source selection fails: none
> of the media file sources are considered suitable for this user-agent and user.
> What is the fallback in this case?
It should pick the closest match available, even if not all conditions
were met.
> The first two following are taken from the current state of IMG tagging in HTML5
>
> 2.2.2 alt
>
> It's probably much more rarely useful than on images, but as noted above, there
> may be some small media files which are semantically significant which can be
> described with a short text string (e.g. "a dog barks"), which could be placed
> in an alt attribute.
OK, for that use case, it seems reasonable to be able to provide a short
description in some way. I'm not necessarily agreeing that it should be
the alt attribute, that's just one possible solution to consider. I
think we need to find and document examples of the kind of videos for
which such a short alternative text would be appropriate.
However, it needs to be clear that it is to be an alternative for the
video, not, as Leif tried to suggest earlier in this thread, an
alternative for just the poster frame.
> 2.2.3 longdesc
>
> The longdesc attribute, when used, takes a URI as value, and links to a 'long
> description'. It is probably the attribute to use to link to such things as a
> transcript (though a transcript is more of a fulltext alternative than a
> description).
The longdesc attribute is not included for the img element. It has been
clearly demonstrated in past discussions that it is a complete failure
in practice and pursuing it as a solution for video is, IMO, a waste of
time. Plus, I have already explained why any sort of long description,
whether it be a transcript, full text alternative, or whatever else, is
useful to more people than just those with accessibility needs. Any
links to a long description should be done using ordinary, visible links
from within the surrounding content.
> 2.2.4 fallback content (video not supported vs. no source is suitable)
>
> As noted above, the proposal that we add to the criteria to select a source
> element further highlights the open question about today's specification: the
> fallback content within media elements is designed for browsers not implementing
> audio/video. It is probably inappropriate to overload that use with the case
> when the browser does implement media elements, but no source is appropriate.
I think the right approach here is for the browser to allow the user to
either save or launch the video in an external media player.
> 3. In-media Selecting/Configuring
>
> 3.1 Introduction
>
> We propose considering the accessibility needs as a set of independent 'axes',
> for which the user can express a clear need, and for which a media element can
> express a clear ability to support, inability to support, or lack of awareness.
>
> The user preferences are two-state: 'I need accessibility X', 'I have no
> specific need for accessibility X'. For un unstated preference 'no specific
> need' is assumed.
>
> The tagging is however tri-state in some sense yes/no/dont-know. The media
> needs to be able to be tagged: 'I can or do meet a need for accessibility X'; 'I
> cannot meet a need for accessibility X'; 'I do not know about accessibility X'.
> For an unstated tag, 'I do not know' is assumed.
>
> Clearly we can now define when a media source matches user needs. A source
> *fails* to match if and only if either of the following are true; otherwise, the
> source matches:
>
> 1. The user indicates a need for an axis, and the source is tagged as
> explicitly /not/ meeting that need;
> 2. The user does /not/ indicate a need, and the file is tagged as being
> explicitly targetted to that need.
I disagree with #2 being considered a failure. A video may contain
features intended for accessibility, such as captions, but if they are
closed captions, then they don't need to be turned on. If they are open
captions, then it's not too much of a problem. However, at for me, a
video with open captions should be given a lower priority than one
without. Obviously, other people will have different priorities.
> We believe that the source tagging should be done as Media Queries
I don't think we should be jumping to solutions just yet. Media queries
is one possibility. Another is to provide a different attribute or
several attributes to indicate each axis, and there may be others to
consider as well. In fact, I don't think media queries is appropriate
for this at all, since it's designed for indicating features describing
the target device, not user preferences.
> 3.2 Method of selection
>
> We suggest that we add a media query, usable on the audio and video elements,
> which is parameterized by a list of axes and an indication of whether the media
> does, or can, meet the need expressed by that axis. The name of the query is
> TBD; here we use 'accessibility'. An example might be:
>
> |accessibility(captions:yes, audio-description:no, epilepsy-avoidance:dont-know)|
That doesn't seem to fit the syntax of media queries, where each feature
is supposed to be given within parenthesis. e.g.
<source ... media="screen and (min-height:240px) and (min-width:320px)">
Also, instead of providing boolean values for each property, we should
be able to indicate other information about them.
Captions, if available, may be open or closed, and only available in
particular languages. Subtitles, if available, may be open or closed
and be available in one or more languges. It's even possible to have
open subtitles in one languge, yet have alternative closed subtitles
shown over the top if turned on. Audio descriptions may not be
available in all of the languages that the video is available in.
For example, take a look at the features of the 101 Dalmations DVD in
Australia.
http://www.ezydvd.com.au/item.zml/797843
It has English and Dutch audio languages, but only has Audio Description
available in English (listed as "English - AD"). It also has English,
Dutch and Hindi subtitles, but only English captions (listed under
subtitles as "English - HI", where "HI" means Hearing Impaired).
Another example, English-language TV programmes are broadcast in Norway
with open Norwegian subtitles. But it is also possible to turn on
closed subtitles (using teletext) for some other European languages
which are then rendered over the top. (I'm not sure which languages they
are). Personally, I think the open subtitles are annoying, especially
since most people here seem to speak English anyway, but it's what they do.
> Note that the second matching rule above means that sources can be ordered in
> the usual intuitive way from most specific to most general but that it also
> means a source might need to be repeated. For example, if the only available
> source has open captions (burned in), it could be in a single <source> element
> without mentioning captions, but it is better in two <source> elements, the
> first of which explicitly says that captions are supported, and the second is
> general and un-tagged. This indicates to the user needing captions that their
> need is consciously being met.
I think we should avoid repetition of source elements pointing to the
same media, and instead provide ways of accurately describing what each
has available.
> 3.4 Axes
>
> We think that the set of axes should be based on a documented set, but that
> adding a new axis should be easier than producing a new revision of the
> specification. IANA registration may be a way to go.
>
> Some of the more obvious axes include:
>
> 1. Captions
> 2. Subtitles
> 3. Audio description of video
> 4. Sign language
>
> Notes:
>
> 1. The USA and Canada differentiate between captions (a replacement for
> hearing the audio) and subtitles (a replacement for audio content that
> is unintelligible, usually because it's in a foreign language). Other
> locales do not make this distinction; nomenclature will need careful
> choice if confusion is to be avoided.
This is true in Australia too. According to Joe Clark, it's only the
British that get the terminology wrong.
http://joeclark.org/access/resources/understanding.html#Language
> 2. Subtitles (in the USA and Canada sense) are not strictly an accessibility
> issue, but can probably be handled here.
Henri Sivonen wrote in a separate mail:
> I would caution against treating subtitles (in the US/Canada sense) an
> instance of the same selection mechanism engineering problem as captions (in
> the US/Canada sense) just because they are the same engineering problem as far
> as encoding timed text goes.
>
> Not hearing audio is (for practical modeling purposes) a single dimension: One
> can hear, one can't hear well, one is deaf. I don't know if "can't hear well"
> maps simply to "captions on"
Sometimes, turning on same-language subtitles as opposed to captions is
useful for people who can't hear well. For example, my dad has trouble
hearing the higher frequencies and has difficulty understanding some
speech because of that. (e.g. He can't hear the difference between a
hard C (as in cat) and T sound very well) So he'll often turn on the
English subtitles on a DVD so he can read them, but he doesn't need the
extra information that the English captions provide for people who can't
hear at all. I'll even do the same myself some times when I need to
keep the volume down low.
You make a reasonable case against using them for automatic selection
purposes. However, consider the case where subtitles are provided in
one language, but captions are not. A hearing impaired person is better
off knowing the subtitles are available and having them turned on than
not knowing. Therefore, it might be better to declare the availability
of subtitles anyway.
> I would guess that content providers would opt for alternative files in
> this case, because additional audio tracks show up on the bandwidth bill
> if served even when not needed.
> ...
> Language skills are multidimensional: A person whose language skills
> cover a non-English native language and English already has four
> dimensions: skill level in both reading and listening in both languages.
> This makes automatic selection mechanism hard to engineer.
Agreed. But this argues against linking to multiple videos using
<source>, each with a different audio language. There are 2 options for
dealing with this situation:
1. Include all alternative languages within the same video file, which
increases file size and adds to the bandwidth bill. This allows
manual audio selection after the video has downloaded.
2. Using individual videos, but providing manual language selection
prior to loading the video. This could also be based on the choice
the user made when they accessed the website, if the site itself is
available in multiple langauges too.
Dave Singer wrote:
> 3. Sign language has a number of variants, not easily identified; not only
> does American sign language differ from British, but the dialects that
> form around schools that use sign language also diverge significantly.
> This problem of identifying what sign language is present or desired is
> exacerbated by ISO 639-2, which has only one code for sign-language
> ('sgn'). The user preference for which kind of sign language is needed may
> need storing, as well as their need for sign language in general. We're
> hoping that the user's general language preferences can be used, for a
> first pass.
I've not seen many programmes use sign language. The one show that I
know of that did some of the time was a childrens early morning cartoon
show in Australia called Cheez TV, which sometimes had a sign language
interpreter in the bottom right of the screen interpreting what the
presenters were saying in the breaks between the cartoons. Although, I
believe they must have used closed captions other times because they
didn't always have the interpreter.
We also need to consider whether or not sign language would be used for
video on the web, and whether or not it's worth finding a solution to
declare their availability. Also, I'm not sure how they would be
implemented from a technical POV. Can they be implemented as a separate
video stream using Picture-in-Picture to overlay the normal video
stream, or would it need to be a complete alternative video stream?
This might depend on the container format used.
We would need to find and document some real world cases of online video
using sign language, so we can investigate how it has been done, if at
all. In fact, we really need to find evidence of all forms of
accessibility features, so we can work out what is and isn't used on the
web, and what we should prioritise and optimise for.
For example, whether we should optimise for serving a single video file
with multiple streams, or individual video files, each with a specific
set of streams.
The requirements for the chosen solution include the following:
1. Provide ways to indicate:
* Language of open captions
* Languages of available closed captions
* Languages of available audio descriptions
* Languages of available non-descriptive audio streams
If it is also deemed appropriate to declare subtitles, then:
* Language of open subtitles
* Languages of available closed subtitles
Any or all of those could also be either none or unknown.
2. An easy to use and understand syntax that is not too verbose.
3. Have reasonable default values.
4. Possibly be extensible to allow for other axes to be defined and
expressed in the future.
5. Avoid unnecessary repetition
6. Support multiple tracks per video file, or multiple videos, each with
a specific set of streams.
This could be done with attributes. For example:
<video ... captions="open:en; closed:fr,de"
subtitles="closed:nl"
audiolang="en,fr,de">
audiodesc="en"
Or perhaps a single accessibility attribute:
<video access="(captions=open:en;closed:fr,de)
and (subtitles=closed:nl)
and (audiolang=en,fr,de)
and (audiodesc=en)">
The syntax of both of those might be a little complex though, and I
would prefer to simplify them if possible. One issue is that while this
does correctly distinguish between captions and subtitles, educating
authors to use them correctly rather than interchangeably may be a
problem, especially given that they incorrectly use the term subtitles
for both in the UK.
Another problem to consider with automaitic selection mechanisms is
that, AIUI, common video container formats don't provide a way to
programmaticly distinguish between subtitle tracks and caption tracks,
since both are just text tracks. I think they just provide the ability
to declare the language of the track, and some also provide the ability
to include human readable descriptions. Text tracks can also be used
for other information besides subtitles and captions. For example, I've
seen DVDs provide commentary using a text track without an accompanying
audio track.
Note that I didn't use the lang or xml:lang attributes to express the
language of the audio streams because it's limited to declaring a single
language. However, in the absense of an explicit audio language
declaration, then assuming it's the same as the element's language is a
reasonable default.
--
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/
Received on Thursday, 4 September 2008 11:23:28 UTC