- From: Bob Lund <B.Lund@CableLabs.com>
- Date: Thu, 7 Nov 2013 22:02:34 +0000
- To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "public-inbandtracks@w3.org" <public-inbandtracks@w3.org>
On 11/7/13 10:11 AM, "Cyril Concolato" <cyril.concolato@telecom-paristech.fr> wrote: >Hi Bob, > >Thanks too for starting the discussion. In general, I'm really in line >with what your draft proposes, except for some details. I added some >answers to your comments below. > >I also have some points of organization: >- do we have a bug tracker with the CG process? I don't know. >- Can we put the spec on some versioning server (GitHub, W3C, ...) so >that people can propose modifications? I seconded Giuseppe's proposal for starting with a Wiki page. >- Will you be at TPAC next week? Can we have a (short) informal meeting, >maybe during the unconference wednesday on the topic. Unfortunately not. > >Le 31/10/2013 21:10, Bob Lund a écrit : >> Hi, >> >> I'd like to start the discussion about the requirements and scope for >>the >> CG work. >> >> Currently, AFAIK there is no specification that fully defines how a UA >> should expose in-band tracks to Web applications. Unspecified details >>are: >> >> * Which in-band tracks of a media resource the UA will expose.By expose >>I >> mean make available as VideoTrack, AudioTrack or TextTrack objects. >In my view, the browser should expose an audio/video track that it knows >it can natively decode it as an AudioTrack/VideoTrack and for all others >it should expose it as a TextTrack, possibly with DataCue (base64 >encoded for instance). I do think there are cases where it makes sense >to decode video/audio track content in JavaScript (e.g. decoding a depth >video track). It should be up to the JS-application developer to decide >based on the performances it get. We need to be careful though to expose >the data efficiently (e.g. not load it until it's requested to control >the memory consumption, and maybe offer a mechanism to filter the cue >change events if the frame rate is too high). > >> * How metadata in the media resource about the in-band tracks is made >> available to Web applications. If the UA recognizes the in-band track, >> some of the metadata associated with the track will be made available by >> the "kind", "language" and "inbandMetadataTrackDispatchType" >>attributes. >> This works fine when the UA fully recognizes the track and when the >> metadata maps completely to the predefined valid attribute values. But, >> this needn't be the case. For example, the UA may recognize an MPEG-2 TS >> audio track but not recognize metadata designating it as a Descriptive >> Video track. >I understand. I think I agree. You have to distinguish: >- the UA can support natively the decoding of the track or not; >- and the UA can map the descriptive data associated to the track to >HTML 5 descriptive data. > >In the MP4 FF case, there is no easy mapping of audio/video track info >to an HTML5 kind. At some point MPEG was considering adding HTML5 kind >types. This has not progressed yet though. Same for MPEG-2, I think. > >> >> For more deterministic handling of inband tracks, the UA could be >>required >> to: >> >> * Expose all in-band tracks, in some form. >Yes >> * Make media resource metadata associated with the inband tracks >> available to Web applications. [1] does this through a TextTrack. >> * Provide data so that the Web application can correlate the metadata >>data >> with the appropriate Track object. >I'm not sure I follow you here. I guess this has to do with the notion >of "Track Description TextTrack" in [2]. > >This idea is interesting and I see where it comes from. An MPEG-2 TS >program can have a time-varying number of streams in a program and >stream changes are signaled with a PMT. So indeed, you could consider a >PMTTextTrack to signal those changes. If understand correctly, the >"Track Description TextTrack" is a generalization of that PMTTextTrack. >For MP4, this would correspond to a 'moov'-TextTrack, but in MP4 the >moov is static for the whole duration of the file and the number of >tracks does not change. I can't think at the moment of file-wide >time-varying metadata information that is not represented in the MP4 >file format as a track. So that "Track Description TextTrack" would have >only 1 cue. > >My initial idea (not sure it covers every case in MPEG-2) was rather to >expose the time-varying information related to a track in the track >itself. For instance, for MPEG-2 TS, if a PMT updates the descriptor >associated to an elementary stream, for which there is already a created >TextTrack, I would create a new cue for that new information. So the cue >data for every stream would carry two types of information: signaling >and/or real data. > >> >> [1] is a specification written by CableLabs about how a UA would meet >> these requirements in the case of MPEG-2 transport stream media >>resources. >> This spec was written before some of the recent additions to HTML5, e.g. >> inbandTrackDispatchType. The users of [1] would like to see a W3C spec >> that addresses the same problem, takes into account the current HTML5.1 >> work and addresses some details that [1] missed, e.g. more precise >> definition of what constitutes a metadata TextTrack, how in-band data >>gets >> segmented into TextTrackCues, how Cue start and end times are derived >>from >> the in-band track. >> >> [2] is an informal draft describing how the technique in [1] can be >> applied to WebM, Ogg and MPEG4 media resources. This CG could address >> these media container formats as well. >> >> So, at a minimum, I propose: >> >> * That the CG start by discussing the requirements outlined above, with >> the goal of creating a spec at least for MPEG-2 TS (recognizing that >>some >> of this may already be covered by HTML5/1. >I agree. >> * WebM and MP4 are widely supported in browsers so it might make sense >>to >> cover these formats. >As you know, I started working on the mapping of MP4 tracks onto HTML >Tracks in [3]. My idea was to use a WebVTT-based syntax as an >intermediate solution, until the CG spec is out, to experiment. I think >it's very close to what you have proposed. >> >> If there is sufficient interest, we could take on other inband track >> formats as well. >Indeed, we could think of RTP-based streams in a future version, but >there is enough on the plate for now. >> >> Comments? >Some additional comments on [2]: >- To identify the type of the track data, I don't think we should rely >on the label. For instance, in [3], I use the MP4 track handler as value >for the label. I think the inbandtrackdispatchtype could be used. But >this raises the question of what syntax to use? I'm not sure MIME is >appropriate, or maybe with 'codecs' and 'profile' parameters. In the MP4 >work [3], I've used a mix of MIME type when the actual cue payload >conforms to that MIME type (e.g. when an MP4 track contains SVG data), >or the MPEG-4 "stream type" and "object type indication". We could >expose as well MPEG-2 TS "stream type" maybe in the "codecs" parameters >of the MPEG-2 MIME type. >- I agree that original track Id should be exposed, we could recommend >to set the Track.id to the Media Fragment Identifier representing that >track in the original file. In [3], I've used 'trackId' but that's not >generic enough. >- I think there is a problem in setting endTime to Infinity as the >WebIDL for that attribute is not "unrestricted double". We could >recommend to set the endTime to the startTime of the next cue but that >introduces latency... >- I think the algorithm to expose mp4 tracks as "text" TextTrack or >"base64-text" TextTrack can be improved to account for the new MPEG-4 >Part 30 tracks for subtiltes (e.g. WebVTT and TTML and others). I can >fix that if you want. >- I think we need to discuss DASH/HLS and others separately, taking MSE >into considerations. I'm not sure we need a mapping of those. >- I think also that we need to be consistent with what happens with >MediaStreams, although I've not followed it closely... > >> >> [1] >>http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf >> [2] http://html5.cablelabs.com/tracks/media-container-mapping.html >[3] >http://concolato.wp.mines-telecom.fr/2013/10/24/using-webvtt-to-carry-medi >a-streams/ > > >Regards, >Cyril > > >-- >Cyril Concolato >Maître de Conférences/Associate Professor >Groupe Multimedia/Multimedia Group >Telecom ParisTech >46 rue Barrault >75 013 Paris, France >http://concolato.wp.mines-telecom.fr/ > >
Received on Thursday, 7 November 2013 22:02:58 UTC