- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Fri, 20 Aug 2010 08:57:41 +1000
- To: Eric Carlson <eric.carlson@apple.com>
- Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>
- Message-ID: <AANLkTinKZxPbMazOEmHunBmgmoeCCzX57doVg_xUGo7L@mail.gmail.com>
On Fri, Aug 20, 2010 at 3:18 AM, Eric Carlson <eric.carlson@apple.com>wrote: > > On Aug 18, 2010, at 4:40 PM, Silvia Pfeiffer wrote: > > Hi media a11y folks, hi Eric, > > In today's call we had a brief discussion on the WHATWG specification of > rendering of time-synchronized text with audio and video resources, see > http://www.whatwg.org/specs/web-apps/current-work/complete/rendering.html#timed-tracks-0. > > The point I was making was that I am disappointed that for <audio> elements > there is no rendering. I was suggesting that for both, <audio> and <video> > elements, rendering of time-synchronized text should depend on the @controls > attribute of the <audio> and <video> elements. The reason behind this is > that I expect a menu to be made available to the user through the @controls > that allows the user to activate/deactivate text tracks from the list of > available text tracks. Because that list is made available through the > @controls, I would also expect that the rendering of the text cues depends > on this @controls attribute being available. > > However, the specification says that <audio> elements don't render any > time-synchronized text, but only <video> elements do. > > We didn't get very far in the discussion - in particular Eric had some > important points to make. Thus, I'd like to take up this discussion here > again. > > My point is that semantically, the *only* difference > between <video> and <audio> elements is the former renders visual media > while the later does not. There is absolutely no requirement that a file in > a <video> element must have visual media, eg. it is perfectly legal to use > an mp3 file in a <video> element. For example, the following : > > <video src="song.mp3" id="video" controls> </video> > > creates a 300x150 element that has only audio data so it doesn't draw > anything (300x150 is the default size of a <video> element). > > There is also no requirement that an <audio> element must not support > files with visual media, it just doesn't render visual data. In the > following example, an HD movie trailer in an <audio> element. The 'controls' > attribute tells the UA to show the default controls, so it does take up > space on the page but it does not render the visual track: > > <audio src="HD_trailer_1024x768.mp4" id="audio" controls> </audio> > > I believe that time-synchronized text, whether it come from a track in the > media file or is loaded from an external file, is *visual media* - it has a > visual representation - so I don't believe it make sense for an <audio> > element to render them. > > Silvia's proposal is to render text cues when an <audio> element has the > 'controls' attribute. This might work for text-only cues, but if we allow an > <audio> element to render time-synchronized text we of course have to allow > it to render burned-in captions, sign language tracks, etc. Those are both > video tracks, so her proposal is actually to make an <audio> element behave > like a <video> element when it has a 'controls' attribute. This change would > break the previous example because the video track would be rendered, > requiring the page author to edit the movie to remove the video track. > I wasn't actually going to allow rendering of video in <audio> elements - just rendering of text tracks. It might be instructive to consider what types of tracks we expect to be added onto what traditionally is regarded as an audio resource and a video resource and whether we still regard audio resources with an in-band sign language track as an audio resource or whether that has moved into the video resource bag. I'd almost say the latter, even if the resource doesn't have a "main" video track. I believe that the person creating the web page should use a <video> > element if a file has visual media, whether it comes from a video track or > from time-synchronized text. > Let me ask a few questions - some of which I don't think have been explicitly addressed yet. Right now, the controls of on audio and video are the same. In http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#user-interfaceit says that we will have a display for alternative accessibility tracks in the controls: to "change the display of closed captions or embedded sign-language tracks, select different audio tracks or turn on audio descriptions" - which to me indicates introduction of a menu to activate/deactivate tracks by the user. Assuming we have an audio file that has externally associated text tracks and also in-band text tracks. Are we planning to have different controls display on such a file when used in <audio> to when it is being used in <video>? I.e. will the available text tracks be exposed in a menu in the @controls on an <audio> element? If so, what should happen when a user activates a track? Unless the Web page author is using the JavaScript API to render the activated tracks, the user will not see any reaction to their activation of tracks. If instead we decide not to include the available accessibility in the @controls of an <audio> element, we've actually just made the element in-accessible. When we move beyond text tracks, the boundaries indeed become blurred. Adding a video track with sign language to an audio file does beg the question whether this is now still a audio resource or has really just turned into a video resource. I would say that your logic applies to this case. But I am hesitant to accept that logic in the text track case. I actually think that the current WHATWG spec doesn't yet address what to do about rendering of audio and video accessibility tracks. As for videos with burnt-in captions - we just have to regard the captions there as part of the video pixels and thus if we use such a video in an <audio> element, they will indeed disappear. But that looks like it also was the intention of the Web page author. If you want to handle such captions explicitly, you have to either do OCR and extract them or re-type the captions into a caption file. I understand where the logic is coming from with regarding <audio> as a non-visual medium and <video> as the visual medium. I can accept that when sign language tracks are there. I do wonder, however, what would be so bad about allowing text tracks to be rendered with audio. As the main use cases here I would see foreign-language users, deaf and hearing ppl sharing the experience, learning-impaired users, chapter navigation, and music lyrics/karaoke users. None of these use cases ask for sign language, but all of these are a basic accessibility need on what we traditionally regard as audio resources. Cheers, Silvia.
Received on Thursday, 19 August 2010 22:58:33 UTC