Re: Proposal for initial CG focus

maybe we should start by creating a wiki table with the formats we are
considering (MPEG2-TS, MP4 etc) and the various Audio, Video and Text
tracks attributes and how they are supposed to be set by the UA?

This will make it easy to see if we are missing something. After the table
is complete, it should be easy to write the equivalent spec text.

Note that HTML5 itself provide already some "hints" (informative)
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#dom-audiotrack-kind

/g


On Thu, Nov 7, 2013 at 6:11 PM, Cyril Concolato <
cyril.concolato@telecom-paristech.fr> wrote:

> Hi Bob,
>
> Thanks too for starting the discussion. In general, I'm really in line
> with what your draft proposes, except for some details. I added some
> answers to your comments below.
>
> I also have some points of organization:
> - do we have a bug tracker with the CG process?
> - Can we put the spec on some versioning server (GitHub, W3C, ...) so that
> people can propose modifications?
> - Will you be at TPAC next week? Can we have a (short) informal meeting,
> maybe during the unconference wednesday on the topic.
>
> Le 31/10/2013 21:10, Bob Lund a écrit :
>
>  Hi,
>>
>> I'd like to start the discussion about the requirements and scope for the
>> CG work.
>>
>> Currently, AFAIK there is no specification that fully defines how a UA
>> should expose in-band tracks to Web applications.  Unspecified details
>> are:
>>
>> * Which in-band tracks of a media resource the UA will expose.By expose I
>> mean make available as VideoTrack, AudioTrack or TextTrack objects.
>>
> In my view, the browser should expose an audio/video track that it knows
> it can natively decode it as an AudioTrack/VideoTrack and for all others it
> should expose it as a TextTrack, possibly with DataCue (base64 encoded for
> instance). I do think there are cases where it makes sense to decode
> video/audio track content in JavaScript (e.g. decoding a depth video
> track). It should be up to the JS-application developer to decide based on
> the performances it get. We need to be careful though to expose the data
> efficiently (e.g. not load it until it's requested to control the memory
> consumption, and maybe offer a mechanism to filter the cue change events if
> the frame rate is too high).
>
>
>  * How metadata in the media resource about the in-band tracks is made
>> available to Web applications. If the UA recognizes the in-band track,
>> some of the metadata associated with the track will be made available by
>> the "kind", "language" and "inbandMetadataTrackDispatchType"  attributes.
>> This works fine when the UA fully recognizes the track and when the
>> metadata maps completely to the predefined valid attribute values. But,
>> this needn't be the case. For example, the UA may recognize an MPEG-2 TS
>> audio track but not recognize metadata designating it as a Descriptive
>> Video track.
>>
> I understand. I think I agree. You have to distinguish:
> - the UA can support natively the decoding of the track or not;
> - and the UA can map the descriptive data associated to the track to HTML
> 5 descriptive data.
>
> In the MP4 FF case, there is no easy mapping of audio/video track info to
> an HTML5 kind. At some point MPEG was considering adding HTML5 kind types.
> This has not progressed yet though. Same for MPEG-2, I think.
>
>
>
>> For more deterministic handling of inband tracks, the UA could be required
>> to:
>>
>> * Expose all in-band tracks, in some form.
>>
> Yes
>
>  * Make media resource metadata  associated with the inband tracks
>> available to Web applications. [1] does this through a TextTrack.
>> * Provide data so that the Web application can correlate the metadata data
>> with the appropriate Track object.
>>
> I'm not sure I follow you here. I guess this has to do with the notion of
> "Track Description TextTrack" in [2].
>
> This idea is interesting and I see where it comes from. An MPEG-2 TS
> program can have a time-varying number of streams in a program and stream
> changes are signaled with a PMT. So indeed, you could consider a
> PMTTextTrack to signal those changes. If understand correctly, the "Track
> Description TextTrack" is a generalization of that PMTTextTrack. For MP4,
> this would correspond to a 'moov'-TextTrack, but in MP4 the moov is static
> for the whole duration of the file and the number of tracks does not
> change. I can't think at the moment of file-wide time-varying metadata
> information that is not represented in the MP4 file format as a track. So
> that "Track Description TextTrack" would have only 1 cue.
>
> My initial idea (not sure it covers every case in MPEG-2) was rather to
> expose the time-varying information related to a track in the track itself.
> For instance, for MPEG-2 TS, if a PMT updates the descriptor associated to
> an elementary stream, for which there is already a created TextTrack, I
> would create a new cue for that new information. So the cue data for every
> stream would carry two types of information: signaling and/or real data.
>
>
>
>> [1] is a specification written by CableLabs about how a UA would meet
>> these requirements in the case of MPEG-2 transport stream media resources.
>> This spec was written before some of the recent additions to HTML5, e.g.
>> inbandTrackDispatchType. The users of [1] would like to see a W3C spec
>> that addresses the same problem, takes into account the current HTML5.1
>> work and addresses some details that [1] missed, e.g. more precise
>> definition of what constitutes a metadata TextTrack, how in-band data gets
>> segmented into TextTrackCues, how Cue start and end times are derived from
>> the in-band track.
>>
>> [2] is an informal draft describing how the technique in [1] can be
>> applied to WebM, Ogg and MPEG4 media resources. This CG could address
>> these media container formats as well.
>>
>> So, at a minimum, I propose:
>>
>> * That the CG start by discussing the requirements outlined above, with
>> the goal of creating a spec at least for MPEG-2 TS (recognizing that some
>> of this may already be covered by HTML5/1.
>>
> I agree.
>
>  * WebM and MP4 are widely supported in browsers so it might make sense to
>> cover these formats.
>>
> As you know, I started working on the mapping of MP4 tracks onto HTML
> Tracks in [3]. My idea was to use a WebVTT-based syntax as an intermediate
> solution, until the CG spec is out, to experiment. I think it's very close
> to what you have proposed.
>
>
>> If there is sufficient interest, we could take on other inband track
>> formats as well.
>>
> Indeed, we could think of RTP-based streams in a future version, but there
> is enough on the plate for now.
>
>>
>> Comments?
>>
> Some additional comments on [2]:
> - To identify the type of the track data, I don't think we should rely on
> the label. For instance, in [3], I use the MP4 track handler as value for
> the label. I think the inbandtrackdispatchtype could be used. But this
> raises the question of what syntax to use? I'm not sure MIME is
> appropriate, or maybe with 'codecs' and 'profile' parameters. In the MP4
> work [3], I've used a mix of MIME type when the actual cue payload conforms
> to that MIME type (e.g. when an MP4 track contains SVG data), or the MPEG-4
> "stream type" and "object type indication". We could expose as well MPEG-2
> TS "stream type" maybe in the "codecs" parameters of the MPEG-2 MIME type.
> - I agree that original track Id should be exposed, we could recommend to
> set the Track.id to the Media Fragment Identifier representing that track
> in the original file. In [3], I've used 'trackId' but that's not generic
> enough.
> - I think there is a problem in setting endTime to Infinity as the WebIDL
> for that attribute is not "unrestricted double". We could recommend to set
> the endTime to the startTime of the next cue but that introduces latency...
> - I think the algorithm to expose mp4 tracks as "text" TextTrack or
> "base64-text" TextTrack can be improved to account for the new MPEG-4 Part
> 30 tracks for subtiltes (e.g. WebVTT and TTML and others). I can fix that
> if you want.
> - I think we need to discuss DASH/HLS and others separately, taking MSE
> into considerations. I'm not sure we need a mapping of those.
> - I think also that we need to be consistent with what happens with
> MediaStreams, although I've not followed it closely...
>
>
>
>> [1] http://www.cablelabs.com/specifications/CL-SP-HTML5-
>> MAP-I02-120510.pdf
>> [2] http://html5.cablelabs.com/tracks/media-container-mapping.html
>>
> [3] http://concolato.wp.mines-telecom.fr/2013/10/24/using-
> webvtt-to-carry-media-streams/
>
> Regards,
> Cyril
>
>
> --
> Cyril Concolato
> Maître de Conférences/Associate Professor
> Groupe Multimedia/Multimedia Group
> Telecom ParisTech
> 46 rue Barrault
> 75 013 Paris, France
> http://concolato.wp.mines-telecom.fr/
>
>
>

Received on Thursday, 7 November 2013 21:52:30 UTC