Re: Tech Discussions on the Multitrack Media (issue-152) from David Singer on 2011-02-24 (public-html@w3.org from February 2011)

From: David Singer <singer@apple.com>
Date: Wed, 23 Feb 2011 16:22:39 -0800
To: Bob Lund <B.Lund@CableLabs.com>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-Id: <B0E5F572-4E7B-425F-BA7A-C0FB71A60B84@apple.com>

On Feb 17, 2011, at 10:06 , Bob Lund wrote:

> I have several comments on the proposal alternatives on the wiki (which have been very informative). As a first poster let me introduce myself - I represent CableLabs, where we’ve been analyzing commercial video service provider requirements and how HTML5, and timed text tracks and multimedia tracks, can be used to meet those requirements.
> 
>  
> 
> Overloading the existing track element representing Timed Text Tracks for media tracks would mix two fundamentally different models. Timed Text Tracks have cues with substantially different semantics than continuous media tracks. Side condition 8 notes this. I think it’s a good idea to keep Timed Text Tracks separate from continuous audio and video tracks. This would seem to rule out 1), 2) and 7).
> 

I don't understand our point here.  Timed text, audio, and video, all lay out a timed presentation along a timeline.  What is 'fundamentally different' about text other than the way it is encoded (and its timing is expressed)?  In Quicktime and 3GPP, we have used 'text formatted' tracks for years with some success, so my curiosity is piqued.  

I think it's great to keep separate the concepts, in a timed presentation track:
* the general media type (video, audio, text, metadata, and so on)
* the encoding type (webM video, AAC audio, EBCDIC text, and so on)
* the function or need that the track is meeting (primary audio, captions, sign-language video, flicker-reduced video, and so on)

> 
Comparing captions and sign-language, I note
* they both need a visual display area
* one is typically encoded as text, the other as video (though burned-in captions, or graphic captions in Japanese, are closer to video)
* they are both optional timed sequences of semantic information that is auxiliary to the main program

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Thursday, 24 February 2011 00:23:41 UTC