Re: Proposal for initial CG focus from Bob Lund on 2013-11-07 (public-inbandtracks@w3.org from November 2013)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Thu, 7 Nov 2013 22:02:34 +0000
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "public-inbandtracks@w3.org" <public-inbandtracks@w3.org>
Message-ID: <CEA1584B.36943%b.lund@cablelabs.com>
On 11/7/13 10:11 AM, "Cyril Concolato"
<cyril.concolato@telecom-paristech.fr> wrote:

>Hi Bob,
>
>Thanks too for starting the discussion. In general, I'm really in line
>with what your draft proposes, except for some details. I added some
>answers to your comments below.
>
>I also have some points of organization:
>- do we have a bug tracker with the CG process?

I don't know.

>- Can we put the spec on some versioning server (GitHub, W3C, ...) so
>that people can propose modifications?

I seconded Giuseppe's proposal for starting with a Wiki page.

>- Will you be at TPAC next week? Can we have a (short) informal meeting,
>maybe during the unconference wednesday on the topic.

Unfortunately not.

>
>Le 31/10/2013 21:10, Bob Lund a écrit :
>> Hi,
>>
>> I'd like to start the discussion about the requirements and scope for
>>the
>> CG work.
>>
>> Currently, AFAIK there is no specification that fully defines how a UA
>> should expose in-band tracks to Web applications.  Unspecified details
>>are:
>>
>> * Which in-band tracks of a media resource the UA will expose.By expose
>>I
>> mean make available as VideoTrack, AudioTrack or TextTrack objects.
>In my view, the browser should expose an audio/video track that it knows
>it can natively decode it as an AudioTrack/VideoTrack and for all others
>it should expose it as a TextTrack, possibly with DataCue (base64
>encoded for instance). I do think there are cases where it makes sense
>to decode video/audio track content in JavaScript (e.g. decoding a depth
>video track). It should be up to the JS-application developer to decide
>based on the performances it get. We need to be careful though to expose
>the data efficiently (e.g. not load it until it's requested to control
>the memory consumption, and maybe offer a mechanism to filter the cue
>change events if the frame rate is too high).
>
>> * How metadata in the media resource about the in-band tracks is made
>> available to Web applications. If the UA recognizes the in-band track,
>> some of the metadata associated with the track will be made available by
>> the "kind", "language" and "inbandMetadataTrackDispatchType"
>>attributes.
>> This works fine when the UA fully recognizes the track and when the
>> metadata maps completely to the predefined valid attribute values. But,
>> this needn't be the case. For example, the UA may recognize an MPEG-2 TS
>> audio track but not recognize metadata designating it as a Descriptive
>> Video track.
>I understand. I think I agree. You have to distinguish:
>- the UA can support natively the decoding of the track or not;
>- and the UA can map the descriptive data associated to the track to
>HTML 5 descriptive data.
>
>In the MP4 FF case, there is no easy mapping of audio/video track info
>to an HTML5 kind. At some point MPEG was considering adding HTML5 kind
>types. This has not progressed yet though. Same for MPEG-2, I think.
>
>>
>> For more deterministic handling of inband tracks, the UA could be
>>required
>> to:
>>
>> * Expose all in-band tracks, in some form.
>Yes
>> * Make media resource metadata  associated with the inband tracks
>> available to Web applications. [1] does this through a TextTrack.
>> * Provide data so that the Web application can correlate the metadata
>>data
>> with the appropriate Track object.
>I'm not sure I follow you here. I guess this has to do with the notion
>of "Track Description TextTrack" in [2].
>
>This idea is interesting and I see where it comes from. An MPEG-2 TS
>program can have a time-varying number of streams in a program and
>stream changes are signaled with a PMT. So indeed, you could consider a
>PMTTextTrack to signal those changes. If understand correctly, the
>"Track Description TextTrack" is a generalization of that PMTTextTrack.
>For MP4, this would correspond to a 'moov'-TextTrack, but in MP4 the
>moov is static for the whole duration of the file and the number of
>tracks does not change. I can't think at the moment of file-wide
>time-varying metadata information that is not represented in the MP4
>file format as a track. So that "Track Description TextTrack" would have
>only 1 cue.
>
>My initial idea (not sure it covers every case in MPEG-2) was rather to
>expose the time-varying information related to a track in the track
>itself. For instance, for MPEG-2 TS, if a PMT updates the descriptor
>associated to an elementary stream, for which there is already a created
>TextTrack, I would create a new cue for that new information. So the cue
>data for every stream would carry two types of information: signaling
>and/or real data.
>
>>
>> [1] is a specification written by CableLabs about how a UA would meet
>> these requirements in the case of MPEG-2 transport stream media
>>resources.
>> This spec was written before some of the recent additions to HTML5, e.g.
>> inbandTrackDispatchType. The users of [1] would like to see a W3C spec
>> that addresses the same problem, takes into account the current HTML5.1
>> work and addresses some details that [1] missed, e.g. more precise
>> definition of what constitutes a metadata TextTrack, how in-band data
>>gets
>> segmented into TextTrackCues, how Cue start and end times are derived
>>from
>> the in-band track.
>>
>> [2] is an informal draft describing how the technique in [1] can be
>> applied to WebM, Ogg and MPEG4 media resources. This CG could address
>> these media container formats as well.
>>
>> So, at a minimum, I propose:
>>
>> * That the CG start by discussing the requirements outlined above, with
>> the goal of creating a spec at least for MPEG-2 TS (recognizing that
>>some
>> of this may already be covered by HTML5/1.
>I agree.
>> * WebM and MP4 are widely supported in browsers so it might make sense
>>to
>> cover these formats.
>As you know, I started working on the mapping of MP4 tracks onto HTML
>Tracks in [3]. My idea was to use a WebVTT-based syntax as an
>intermediate solution, until the CG spec is out, to experiment. I think
>it's very close to what you have proposed.
>>
>> If there is sufficient interest, we could take on other inband track
>> formats as well.
>Indeed, we could think of RTP-based streams in a future version, but
>there is enough on the plate for now.
>>
>> Comments?
>Some additional comments on [2]:
>- To identify the type of the track data, I don't think we should rely
>on the label. For instance, in [3], I use the MP4 track handler as value
>for the label. I think the inbandtrackdispatchtype could be used. But
>this raises the question of what syntax to use? I'm not sure MIME is
>appropriate, or maybe with 'codecs' and 'profile' parameters. In the MP4
>work [3], I've used a mix of MIME type when the actual cue payload
>conforms to that MIME type (e.g. when an MP4 track contains SVG data),
>or the MPEG-4 "stream type" and "object type indication". We could
>expose as well MPEG-2 TS "stream type" maybe in the "codecs" parameters
>of the MPEG-2 MIME type.
>- I agree that original track Id should be exposed, we could recommend
>to set the Track.id to the Media Fragment Identifier representing that
>track in the original file. In [3], I've used 'trackId' but that's not
>generic enough.
>- I think there is a problem in setting endTime to Infinity as the
>WebIDL for that attribute is not "unrestricted double". We could
>recommend to set the endTime to the startTime of the next cue but that
>introduces latency...
>- I think the algorithm to expose mp4 tracks as "text" TextTrack or
>"base64-text" TextTrack can be improved to account for the new MPEG-4
>Part 30 tracks for subtiltes (e.g. WebVTT and TTML and others). I can
>fix that if you want.
>- I think we need to discuss DASH/HLS and others separately, taking MSE
>into considerations. I'm not sure we need a mapping of those.
>- I think also that we need to be consistent with what happens with
>MediaStreams, although I've not followed it closely...
>
>>
>> [1] 
>>http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
>> [2] http://html5.cablelabs.com/tracks/media-container-mapping.html
>[3] 
>http://concolato.wp.mines-telecom.fr/2013/10/24/using-webvtt-to-carry-medi
>a-streams/ 
>
>
>Regards,
>Cyril
>
>
>-- 
>Cyril Concolato
>Maître de Conférences/Associate Professor
>Groupe Multimedia/Multimedia Group
>Telecom ParisTech
>46 rue Barrault
>75 013 Paris, France
>http://concolato.wp.mines-telecom.fr/
>
>
Received on Thursday, 7 November 2013 22:02:58 UTC