Re: Proposal for initial CG focus

On 11/6/13 1:36 AM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

>Hi Bob,
>
>Thanks for starting this discussion.

You're welcome. See my responses inline.

Bob

>
>I've got some comments inline.
>
>
>On Fri, Nov 1, 2013 at 7:10 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
>> Hi,
>>
>> I'd like to start the discussion about the requirements and scope for
>>the
>> CG work.
>>
>> Currently, AFAIK there is no specification that fully defines how a UA
>> should expose in-band tracks to Web applications.  Unspecified details
>>are:
>>
>> * Which in-band tracks of a media resource the UA will expose.By expose
>>I
>> mean make available as VideoTrack, AudioTrack or TextTrack objects.
>
>If the browser doesn't recognize a track as a Video or Audio track, it
>has no choice but to expose it as a Text track, if at all.

Yes

> So, I think
>we're really only talking about text tracks here, right?

Video, audio and text tracks all need an attribute, track id, label or
whatever, so that the track can be correlated with the that tracks
metadata, as described below.
 
>
>
>> * How metadata in the media resource about the in-band tracks is made
>> available to Web applications. If the UA recognizes the in-band track,
>> some of the metadata associated with the track will be made available by
>> the "kind", "language" and "inbandMetadataTrackDispatchType"
>>attributes.
>> This works fine when the UA fully recognizes the track and when the
>> metadata maps completely to the predefined valid attribute values. But,
>> this needn't be the case. For example, the UA may recognize an MPEG-2 TS
>> audio track but not recognize metadata designating it as a Descriptive
>> Video track.
>
>Do you have example files for which this applies? I think such
>examples would be the best place to start.

MPEG-2 TS Descriptive video service in North America is a secondary audio
track with the main dialogue and video descriptions premixed. There is no
@kind value to denote this, so the only way the Web app can identify such
a track is by examining the metadata descriptor. So, the UA recognizes the
track as audio (stream type will denote type of audio). Web app examines
the track metadata to determine type of audio and then parses remaining
metadata to determine if the audio is combined main dialogue + video
description.
 
>
>
>> For more deterministic handling of inband tracks, the UA could be
>>required
>> to:
>>
>> * Expose all in-band tracks, in some form.
>
>If they are audio or video tracks that the browser does not recognize
>as such, it will be pretty useless to expose them, since a JS app is
>likely not in a better place to decode them than the browser.

Maybe, but then this should be stated.

>
>
>> * Make media resource metadata  associated with the inband tracks
>> available to Web applications. [1] does this through a TextTrack.
>
>Yes, I think that's the correct approach.
>
>
>> * Provide data so that the Web application can correlate the metadata
>>data
>> with the appropriate Track object.
>
>What does that mean?

If JS receives metadata associated with inband tracks, as proposed above,
then it needs to be able to relate this metadata to tracks in the
tracklist objects. In [1], this is done by assigning the elementary stream
packet ID (PID) to the track.label. JS has the PID as part of the metadata
so it can make the association. Equivalent IDs exist in other media
container formats.

>
>
>> [1] is a specification written by CableLabs about how a UA would meet
>> these requirements in the case of MPEG-2 transport stream media
>>resources.
>> This spec was written before some of the recent additions to HTML5, e.g.
>> inbandTrackDispatchType. The users of [1] would like to see a W3C spec
>> that addresses the same problem, takes into account the current HTML5.1
>> work and addresses some details that [1] missed, e.g. more precise
>> definition of what constitutes a metadata TextTrack, how in-band data
>>gets
>> segmented into TextTrackCues, how Cue start and end times are derived
>>from
>> the in-band track.
>>
>> [2] is an informal draft describing how the technique in [1] can be
>> applied to WebM, Ogg and MPEG4 media resources. This CG could address
>> these media container formats as well.
>
>I was under the impression that this CG is working on re-writing an
>updated version of [2] that would also include some of the salient
>points of [1].

That would be OK with me. That's what I wanted to hear from others -
whether we envision one or multiple specs.

>
>
>> So, at a minimum, I propose:
>>
>> * That the CG start by discussing the requirements outlined above, with
>> the goal of creating a spec at least for MPEG-2 TS (recognizing that
>>some
>> of this may already be covered by HTML5/1.
>
>That's what "updating [2]" means, right?

Yes.

>
>
>> * WebM and MP4 are widely supported in browsers so it might make sense
>>to
>> cover these formats.
>
>MP4 is not covered by [1] or [2], correct?

MP4 is covered in [2] but I think it probably needs input MP4 experts.

>
>I think mapping WebM would be trivial because it has been built to
>match the HTML5 spec.

WebM is covered in [2] but, again, would benefit from WebM experts' input.

>
>
>> If there is sufficient interest, we could take on other inband track
>> formats as well.
>
>I can help make sure the mapping to HTML5 is accurate. However, I have
>no experience with the format of inband tracks in MPEG-2 or MP4.
>Hopefully we have some experts of these formats on the list who have
>example files and format specifications that we can reference to make
>the right proposals.
>
>Regards,
>Silvia.
>
>
>> Comments?
>>
>> [1] 
>>http://www.cablelabs.com/specifications/CL-SP-HTML5-MAP-I02-120510.pdf
>> [2] http://html5.cablelabs.com/tracks/media-container-mapping.html
>>
>> Thanks,
>> Bob
>>
>>

Received on Thursday, 7 November 2013 16:03:40 UTC