- From: Dave Singer <singer@apple.com>
- Date: Fri, 9 Nov 2007 12:15:24 -0500
- To: aurélien levy <aurelien.levy@free.fr>, public-html@w3.org
At 23:21 +0200 2/07/07, aurélien levy wrote: >Is anybody have ideas about this issue : > >- Actualy i see no way to have synchronized >caption and audiodescription on video element >(except from directly embed caption or >audiodescription in the video itself) or is the >media + source element here to achieve things >like that ? why did not you take the SMIL audio >and text element ? > >Aurélien I apologize for not replying to this earlier. But since it's currently wiki-ed and deserves response, here we go. Also, explicit provision of support for audio and video opens up a historic opportunity to elevate the level of accessibility of multimedia, and we should take advantage of it. The current hope, not explicitly stated anywhere (and it should be), is roughly as follows. But note that I and others are working this area; I've spent some excellent time at the TP with some accessibility people and hope to improve the status. Overall, I am acutely aware that good accessibility results not only from simply writing specs that appear to enable it, but also from the three-legged stool of implement-author-use. That is, for accessibility to actually happen: a) the tool vendors have to implement the support for both authoring and presentation; b) the authors have to use that support and provide accessible content; c) the users have to be able to use the system and achieve the accessibility they need. It's amazingly easy to design schemes which fail on one of these three. Those are not good designs. So, the specification design. First, I think it's good if the design is well layered. We need good support and a good framework at the markup level (HTML, CSS etc.), and I think that should be technology neutral, and also (as much as possible) neutral on the kinds of accessibility and the ways that it can be achieved. Then, at the media level, it should be possible to respond to the accessibility needs, and actually provide accessibility. With all that in mind, where do we stand? First, the audio and video elements have a set of sources, and the source selection can be affected by media queries. We envision one or more media queries that allow the users to express a permanent binary preference about various 'axes' of accessibility: I explicitly want captions, I explicitly must avoid stroboscopic effects in video, I need audio description of video, I need high-contrast audio (audio with minimal background noise or music), and so on. Thus, we can select a source that is suitable for a need. The design clearly hopes that the axes of accessibility are capable of update (addition) without having to do a full revision of the HTML specification. Second, we envision that the selected source should then be configured with those same preferences. For example, some container formats support streams that can be optionally enabled. A caption track might be distributed disabled but enabled on preference: the user's preferences must be used not only to select a source but also configure it. Third, various attributes of the video and audio elements allow for control of other aspects of the presentation. For example, I learned this week that some users like to have multimedia presented slower than normal. I want to look into the control of playback rate and check that they can manage such that 'normal' is say 80% of what is considered generally normal. Discussions this week have suggested that contrast might fall into the same situation. So, with that as a framework, do we also need to look at the media level? If we embed a SMIL file in the HTML, it can carry the captioning as a separate stream; I (we) need to look into whether system-select can be used there to enable captions etc. based on user preferences (the 'configuration') described above. Another interesting area is 'post-captioning' -- databases of captions that are developed in various languages independently of the content producer. I'm not sure how easy it is to provide for the user to be able to pick up and link those in, if needed; it may be an author question. Note that source selection in the audio/video tags, and system-select in SMIL, already cover language-based selection. There is a small issue here for sign language, by the way. ISO 639-2 (3-letter codes for languages) has a single code for sign language in general ('sgn'); the iana registry <http://www.iana.org/assignments/language-tags> (RFC3066, which says it's obsolete, but I assume RFC4646 has similar provision) has geographic variants, but apparently there are dialect effects caused by the fact the sign language is almost entirely real-time and local (unlike, say, english, which is both written and broadcast). So apparently dialects develop, particularly around schools... There's clearly much work to be done here still. What are the appropriate binary axes (want/dont-want)? What are the aspects of multimedia that need variable control (rate, contrast are identified above)? Can they be 'styled' or otherwise controlled appropriately? Even if the framework is right, are there any formats that fit into it? What other lurking issues are there like the sign language one? -- David Singer Apple/QuickTime
Received on Friday, 9 November 2007 17:15:47 UTC