- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Fri, 25 Feb 2011 11:57:30 +1100
- To: David Singer <singer@apple.com>
- Cc: Bob Lund <B.Lund@cablelabs.com>, "public-html@w3.org" <public-html@w3.org>
On Fri, Feb 25, 2011 at 10:24 AM, David Singer <singer@apple.com> wrote: > > On Feb 24, 2011, at 8:57 , Bob Lund wrote: > > [Bob Lund] I agree with your observation that timed text, audio and video > all layout a presentation along a timeline. In the context of HTML5, though, > “Timed Text Tracks” expose “cues” with a start and end time, and data – > either text or metadata. The proposed multi-track media APIs expose the > presence of additional tracks along with the ability to denote whether the > track is showing.. There is no “cue” and there is no access to the data in > the track. > > > True, access to data in the track would be useful for any media type, not > just text. (Audio processing, for example, or extracting an image from > video and painting it onto a Canvas). So rather than treating text tracks > as special, I'd prefer to see all tracks treated powerfully enough to meet > the text and other needs. That's not really possible. The main feature of text tracks is that their data are sparse chunks along the timeline with relatively little data, therefore it is possible to parse all of this data into a cue list, keep it in memory and make it available as a TextTrackCueList to JS, as well as throw an event on the track when cues change, and on the activated and deactivated cues themselves. We need this kind of flexibility on the text cues to allow people to build their own interfaces around the text cues. Introducing an API of this kind for audio and video tracks is, however, not really possible. Assuming we take the concept of a "cue" to mean a "group of samples" for audio and video, then we'd have to hold a large amount of data in memory for an AudioCueList or a VideoCueList. And since the individual cues would typically related to a small amount of time (for video I would think they relate to a frame, for audio maybe to 40ms), then as we play, we'd have constantly firing events both on the track and the cues and we'd have to constantly adapt the CueList making it not useful to the JS programmer anyway and probably exploding the browser. So, basically, the audio and video API for data can only really be a polling API, while for text it is and totally should be a push API. Even with in-band streams: as soon as the browser finds a text cue, it needs to make it available to the browser to add to the TextTrackCueList, so that the browser and JS developer get sufficient early notice and can do something with it. The discontinuous and sparse nature of text tracks make them a very different beast to audio and video track. If we did want to deal with audio and video tracks as well as text tracks through the TextTrack API, I would think we can only make it such that this part of the TextTrack API is disabled for audio and video tracks: readonly attribute TextTrackCueList cues; readonly attribute TextTrackCueList activeCues; attribute Function oncuechange; I don't know what the downsides of such an approach would be, but it certainly feels a bit clunky. Cheers, Silvia.
Received on Friday, 25 February 2011 00:58:22 UTC