RE: HTML5 support for track metadata

From: Bob Lund [mailto:B.Lund@CableLabs.com]
Sent: Tuesday, March 04, 2014 2:12 PM
To: Clift, Graham; public-inbandtracks@w3.org
Cc: Ota, Takaaki; Wu, Max; Nejat, Mike; Candelore, Brant
Subject: Re: HTML5 support for track metadata



From: <Clift>, Graham <Graham.Clift@am.sony.com<mailto:Graham.Clift@am.sony.com>>
Date: Tuesday, March 4, 2014 at 2:17 PM
To: Bob Lund <b.lund@cablelabs.com<mailto:b.lund@cablelabs.com>>, "public-inbandtracks@w3.org<mailto:public-inbandtracks@w3.org>" <public-inbandtracks@w3.org<mailto:public-inbandtracks@w3.org>>
Cc: "Ota, Takaaki" <Takaaki.Ota@am.sony.com<mailto:Takaaki.Ota@am.sony.com>>, "Wu, Max" <Max.Wu@am.sony.com<mailto:Max.Wu@am.sony.com>>, "Nejat, Mike" <Mahyar.Nejat@am.sony.com<mailto:Mahyar.Nejat@am.sony.com>>, Brant Candelore <brant.candelore@am.sony.com<mailto:brant.candelore@am.sony.com>>
Subject: RE: HTML5 support for track metadata

Hi CG,

I believe that option 3 is the best approach because it requires least change to HTML5 specification, covers the use cases and will be the most likely to pass the HTML WG.

As to how to handle events (with minimal change to HTML spec), I was thinking the following:

When an application sees a change in the PMT there should also be a corresponding change to one or more of the TextTrackList/VideoTrackList/AudioTrackList 's.
Seems then it should be sensible to tie the PMT change in inBandMetaDataDispatchType to an onchange event in the TextTrackList object. This would be sufficient for the purpose tracking PMT data and if cues are needed then they could be generated easily by the application thus eliminating the only advantage that option 1 has IMHO.

Changes in the PMT caused by adding/removing elementary streams should result in one of onhange, onddrack, onemovetrack events. Are there any use cases where the PMT for an existing track changes?

[<Graham>] If there will never be a change to the track metadata info that certainly simplifies it further. Personally, I do not know if there are use cases of PMT changing for existing tracks.


BTW,

As well as deciding how to handle the PMT data via the three alternatives I believe more work should be done on Item 3) (creating in-band metadata text tracks  cues). In particular there needs to be more clarification to ensure consistency across implementations. The two areas below is where I see some problems:


a)       Handling private section payloads that are split across many TS packets is not well defined.

My proposal defines this - a complete private section is returned in the cue.


[<Graham>] But that raises the problem of knowing when the section is complete. We could use the next payload_unit_start_indicator==1, and leave it up to the encoder to insert a dummy payload right after the real payload to trigger the cue creation. This would be easier than arbitrary timeouts.



b)      Seems there are three possible approaches.

                                                    i.                              Since the detection method proposed is the 'payload_unit_start_indicator', this would suggest that the transport demux is collecting the individual payloads before presenting them as a cue to the web application. If this is the case then how does the demux decide that the payload is complete? If it waits to see the next 'payload_unit_start_indicator' then the timing may be too late to be relevant.

                                                  ii.                              If, on the other hand, the demux creates a cue for each payload entity then that could impact performance.

                                                iii.                              Maybe it just waits for a period of time and if no more payloads packets are received then cue what we have. This approach would mean there is an expectation for the application to handle variably fragmented cues.


Leaving it up the UA to decide how to implement could be challenging for web application designers to allow for all possiblities.

c)      There is no clarity of what the startTime means to the application. The spec says this is with respect to the media resource time which presumably means the PTS from some audio/video PES. But which timing? After the private payload is sent or before private payload is sent? And what is the startTime for payloads split across TS packets with Video PES packets in between, especially when partial payload DataCues are supported as  (as in case iii above.) The option of leaving this up to the implementation is unsatisfactory because of the potential for variations in the result.

I agree more guidelines in cue generation would remove ambiguity. I think startTime in the case of MPEG-2 TS metadata should be the current time in the media resource when the private section is received.
[<Graham>] would that be the last received value of current time prior to seeing the first packet of the payload?



Regards

Graham Clift


From: Bob Lund [mailto:B.Lund@CableLabs.com]
Sent: Tuesday, March 04, 2014 11:50 AM
To: public-inbandtracks@w3.org<mailto:public-inbandtracks@w3.org>
Subject: HTML5 support for track metadata

Hi CG,

I think that the existing HTML5 CR spec [1] is very close to supporting the use cases that are being discussed in the CG. I propose submitting several HTML5 bugs to close the gap and I'm interested in your thoughts.

An alternative described in the CG wiki is to use kind and inBandMetadataTrackDispatchType attributes to expose track metadata: kind is used for all tracks except metadata text tracks and inBandMetadataTrackDispatchType is used for metadata text tracks. This covers the use cases that have been described in the CG but requires some additions to existing HTML5 CR sections:

1) Additions to the table "Return values for AudioTrack.kind() and VideoTrack.kind()" in [2] describing how @kind should be set for various track types. [3] shows the new additions for MPEG-2 TS and DASH media resources.

2) Text track equivalent of the table "Return values for AudioTrack.kind() and VideoTrack.kind()" in [2]. [4] shows such an equivalent table for setting @kind for text tracks in MPEG-2 TS and DASH. This could go in the HTML5 CR spec here [5].

3) Guidelines for creating in-band metadata text track cues. Here is the start for MPEG-2 TS [5]. This table could go here [6].

4) Additional definition for DASH describing how to set inBandMetadataTrackDispatchType[7].

Does anyone see a reason not to file bugs to add 1-4 above? These changes are consistent with the direction already taken in [1]. Making these changes wouldn't preclude further work in the CG and would address use cases that have been identified so far.

Bob

[1] http://www.w3.org/TR/html5/
[2] http://www.w3.org/TR/html5/embedded-content-0.html#audiotracklist-and-videotracklist-objects
[3] https://www.w3.org/community/inbandtracks/wiki/Main_Page#Audio_and_video_kind_table
[4] https://www.w3.org/community/inbandtracks/wiki/Main_Page#Text_kind_table
[5] https://www.w3.org/community/inbandtracks/wiki/Main_Page#Guidelines_for_creating_metadata_text_track_cues
[6] http://www.w3.org/TR/html5/embedded-content-0.html#sourcing-in-band-text-tracks
[7] https://www.w3.org/community/inbandtracks/wiki/Main_Page#Exposing_a_Media_Resource_Specific_TextTrack

Received on Tuesday, 4 March 2014 23:14:47 UTC