Re: Proposal for initial CG focus from Bob Lund on 2013-11-08 (public-inbandtracks@w3.org from November 2013)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Fri, 8 Nov 2013 17:41:00 +0000
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "public-inbandtracks@w3.org" <public-inbandtracks@w3.org>
Message-ID: <CEA26B80.36A8B%b.lund@cablelabs.com>
On 11/7/13 10:11 AM, "Cyril Concolato"
<cyril.concolato@telecom-paristech.fr> wrote:

>Hi Bob,
>
>Thanks too for starting the discussion. In general, I'm really in line
>with what your draft proposes, except for some details. I added some
>answers to your comments below.
>
>I also have some points of organization:
>- do we have a bug tracker with the CG process?
>- Can we put the spec on some versioning server (GitHub, W3C, ...) so
>that people can propose modifications?
>- Will you be at TPAC next week? Can we have a (short) informal meeting,
>maybe during the unconference wednesday on the topic.
>
>Le 31/10/2013 21:10, Bob Lund a écrit :
>> Hi,
>>
>> I'd like to start the discussion about the requirements and scope for
>>the
>> CG work.
>>
>> Currently, AFAIK there is no specification that fully defines how a UA
>> should expose in-band tracks to Web applications.  Unspecified details
>>are:
>>
>> * Which in-band tracks of a media resource the UA will expose.By expose
>>I
>> mean make available as VideoTrack, AudioTrack or TextTrack objects.
>In my view, the browser should expose an audio/video track that it knows
>it can natively decode it as an AudioTrack/VideoTrack and for all others
>it should expose it as a TextTrack, possibly with DataCue (base64
>encoded for instance). I do think there are cases where it makes sense
>to decode video/audio track content in JavaScript (e.g. decoding a depth
>video track). It should be up to the JS-application developer to decide
>based on the performances it get. We need to be careful though to expose
>the data efficiently (e.g. not load it until it's requested to control
>the memory consumption, and maybe offer a mechanism to filter the cue
>change events if the frame rate is too high).

[1] specifies this.

>
>> * How metadata in the media resource about the in-band tracks is made
>> available to Web applications. If the UA recognizes the in-band track,
>> some of the metadata associated with the track will be made available by
>> the "kind", "language" and "inbandMetadataTrackDispatchType"
>>attributes.
>> This works fine when the UA fully recognizes the track and when the
>> metadata maps completely to the predefined valid attribute values. But,
>> this needn't be the case. For example, the UA may recognize an MPEG-2 TS
>> audio track but not recognize metadata designating it as a Descriptive
>> Video track.
>I understand. I think I agree. You have to distinguish:
>- the UA can support natively the decoding of the track or not;
>- and the UA can map the descriptive data associated to the track to
>HTML 5 descriptive data.
>
>In the MP4 FF case, there is no easy mapping of audio/video track info
>to an HTML5 kind. At some point MPEG was considering adding HTML5 kind
>types. This has not progressed yet though. Same for MPEG-2, I think.

As I replied in 
http://lists.w3.org/Archives/Public/public-inbandtracks/2013Nov/0002.html,
it seems to me that it will always be the case where we can map certain
in-band track types to @kind values but not others. So, it would be
possible to specify when and how the UA sets kind, but we still should
provide JS the raw metadata for the other cases.

>
>>
>> For more deterministic handling of inband tracks, the UA could be
>>required
>> to:
>>
>> * Expose all in-band tracks, in some form.
>Yes
>> * Make media resource metadata  associated with the inband tracks
>> available to Web applications. [1] does this through a TextTrack.
>> * Provide data so that the Web application can correlate the metadata
>>data
>> with the appropriate Track object.
>I'm not sure I follow you here. I guess this has to do with the notion
>of "Track Description TextTrack" in [2].
>
>This idea is interesting and I see where it comes from. An MPEG-2 TS
>program can have a time-varying number of streams in a program and
>stream changes are signaled with a PMT. So indeed, you could consider a
>PMTTextTrack to signal those changes. If understand correctly, the
>"Track Description TextTrack" is a generalization of that PMTTextTrack.
>For MP4, this would correspond to a 'moov'-TextTrack, but in MP4 the
>moov is static for the whole duration of the file and the number of
>tracks does not change. I can't think at the moment of file-wide
>time-varying metadata information that is not represented in the MP4
>file format as a track. So that "Track Description TextTrack" would have
>only 1 cue.
>
>My initial idea (not sure it covers every case in MPEG-2) was rather to
>expose the time-varying information related to a track in the track
>itself. For instance, for MPEG-2 TS, if a PMT updates the descriptor
>associated to an elementary stream, for which there is already a created
>TextTrack, I would create a new cue for that new information. So the cue
>data for every stream would carry two types of information: signaling
>and/or real data.

I agree your proposal would work and it could even be applied to the
MPEG-2 TS case. But, there was no texttrack attribute to hold that track's
metadata. That is the reason I proposed exposing the PMT as a texttrack.

I like your approach because it seems simpler. Also, the addition of the
metadataTrackDispatchType to HTML5 essentially is the attribute containing
the track metadata, at least for text tracks of @kind == metatdata. Maybe
something similar should exist for other types of text tracks, and video
and audio tracks, and be renamed as trackMetadata.

>
>>
>> [1] is a specification written by CableLabs about how a UA would meet
>> these requirements in the case of MPEG-2 transport stream media
>>resources.
>> This spec was written before some of the recent additions to HTML5, e.g.
>> inbandTrackDispatchType. The users of [1] would like to see a W3C spec
>> that addresses the same problem, takes into account the current HTML5.1
>> work and addresses some details that [1] missed, e.g. more precise
>> definition of what constitutes a metadata TextTrack, how in-band data
>>gets
>> segmented into TextTrackCues, how Cue start and end times are derived
>>from
>> the in-band track.
>>
>> [2] is an informal draft describing how the technique in [1] can be
>> applied to WebM, Ogg and MPEG4 media resources. This CG could address
>> these media container formats as well.
>>
>> So, at a minimum, I propose:
>>
>> * That the CG start by discussing the requirements outlined above, with
>> the goal of creating a spec at least for MPEG-2 TS (recognizing that
>>some
>> of this may already be covered by HTML5/1.
>I agree.
>> * WebM and MP4 are widely supported in browsers so it might make sense
>>to
>> cover these formats.
>As you know, I started working on the mapping of MP4 tracks onto HTML
>Tracks in [3]. My idea was to use a WebVTT-based syntax as an
>intermediate solution, until the CG spec is out, to experiment. I think
>it's very close to what you have proposed.

[3] is actually new to me and I will review it.

>>
>> If there is sufficient interest, we could take on other inband track
>> formats as well.
>Indeed, we could think of RTP-based streams in a future version, but
>there is enough on the plate for now.
>>
>> Comments?
>Some additional comments on [2]:

Let me look at [3] in more detail before responding to your comments below.

>- To identify the type of the track data, I don't think we should rely
>on the label. For instance, in [3], I use the MP4 track handler as value
>for the label. I think the inbandtrackdispatchtype could be used. But
>this raises the question of what syntax to use? I'm not sure MIME is
>appropriate, or maybe with 'codecs' and 'profile' parameters. In the MP4
>work [3], I've used a mix of MIME type when the actual cue payload
>conforms to that MIME type (e.g. when an MP4 track contains SVG data),
>or the MPEG-4 "stream type" and "object type indication". We could
>expose as well MPEG-2 TS "stream type" maybe in the "codecs" parameters
>of the MPEG-2 MIME type.
>- I agree that original track Id should be exposed, we could recommend
>to set the Track.id to the Media Fragment Identifier representing that
>track in the original file. In [3], I've used 'trackId' but that's not
>generic enough.
>- I think there is a problem in setting endTime to Infinity as the
>WebIDL for that attribute is not "unrestricted double". We could
>recommend to set the endTime to the startTime of the next cue but that
>introduces latency...
>- I think the algorithm to expose mp4 tracks as "text" TextTrack or
>"base64-text" TextTrack can be improved to account for the new MPEG-4
>Part 30 tracks for subtiltes (e.g. WebVTT and TTML and others). I can
>fix that if you want.
>- I think we need to discuss DASH/HLS and others separately, taking MSE
>into considerations. I'm not sure we need a mapping of those.
>- I think also that we need to be consistent with what happens with
>MediaStreams, although I've not followed it closely...
>
>>
>> [1] 
>>http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
>> [2] http://html5.cablelabs.com/tracks/media-container-mapping.html
>[3] 
>http://concolato.wp.mines-telecom.fr/2013/10/24/using-webvtt-to-carry-medi
>a-streams/ 
>
>
>Regards,
>Cyril
>
>
>-- 
>Cyril Concolato
>Maître de Conférences/Associate Professor
>Groupe Multimedia/Multimedia Group
>Telecom ParisTech
>46 rue Barrault
>75 013 Paris, France
>http://concolato.wp.mines-telecom.fr/
>
>
Received on Friday, 8 November 2013 17:41:32 UTC