Liaison Letter on Mapping MPEG DASH Events to HTML5 Text Tracks and Cues

Dear W3C colleagues,

I'm writing to you in my role as chair of the Specification Working Group of the HbbTV Association.

HbbTV is looking at a mapping from the new event mechanism defined in the 2nd edition of MPEG DASH to the Text Tracks and Cues defined in HTML5. We have identified a significant issue with this mapping and are looking to raise this with the appropriate group or groups in the W3C. If your group is not the most appropriate one then please can you let us know which group we should address.

HbbTV has been one of the early adopters of MPEG DASH including a simple profile of it in our V1.5 specification first published in March 2012 and now being deployed in many of the 2013 connected TV sets in Europe. We are now looking at more advanced uses of MPEG DASH and at the second edition of the DASH specification currently going through final approval in MPEG.

One of the additions to the 2nd edition of the DASH specification compared to the first edition is the event mechanism defined in section 5.10 of that document. For those who don't have access to the document, a DASH MPD can define multiple streams of events. Events typically have a payload, a time and perhaps a duration. Each stream of events is identified by a URI and a free format "value". Events can be carried in a new ‘emsg’ box in an ISOBMFF file or defined in the XML of the MPD. DASH refers to these as inband events and MPD events respectively however both are really inband when seen from the browser.

To expose these event streams and events to HTML5, we have been looking at mapping event streams to TextTracks and individual events to Cues - most likely the HTML5.1 'DataCue'. There seems to be a reasonable mapping between the properties of TextTracks and DataCues and DASH events (see below). However we have encountered a problems with the dynamic behaviour where we thought someone in the W3C (perhaps yourselves?) might be able to provide some suggestions or guidance.

We believe there could be a problem with short or zero duration Text Track Cues when they are processed during video playback according to the "time marches on" algorithm described here: http://www.w3.org/html/wg/drafts/html/CR/embedded-content-0.html#time-marches-on

The “time marches on” algorithm decides which cues appear in the active cue list and when events are fired on tracks and their cues, it is run with a maximum of 250ms between iterations. Events of a short enough duration that they start and end between these iterations are “missed cues”. These missed cues will have their onenter and onexit events fired and will cause a cuechange event to be fired on the track, however they will never appear in the activeCues list.

An application waiting on the cuechange event and then reading activeCues will therefore have no guarantee of seeing events with a duration less than 250ms. Potential work-arounds for this are:
•    Attach an onenter or onexit event to every cue.
•    On every oncuechange event access the full cues list and check for missed cues based on startTime and endTime of each cue.
•    Only use events with a duration of 250ms or longer.
•    Pass all DASH events to the HTML browser with a minimum 250ms duration.

Attaching an onenter or onexit event to every cue may be difficult because of the segmented nature of MPEG DASH, the media player may only have access to the next segment a short amount of time before it is due to be played. In the case of native support for MPEG DASH in the browser (as opposed to an MSE-based solution for MPEG DASH playback), DASH events at the start of a segment may be active less than 250ms after the media player has parsed the segment and added the cues it contains. This makes it difficult for a process fired by the timeupdate event to attach event handlers to new cues before they are active. The same process fired by a more frequent event such as requestAnimationFrame is not ideal and could have a performance impact on embedded devices.

Accessing the full cue list and calculating which if any events have been missed is also less than ideal and could cause problems where there are a large number of cues present in the track. If this were necessary it would remove a lot of the utility of the text track cue mechanism as developers would have to duplicate event triggering functionality in their own application.

Only using events with a duration of 250ms or longer introduces a platform specific requirement for MPEG DASH streams in browsers which may otherwise work across platforms which can provide suitable event handling. This also causes a problem conceptually where DASH events may represent real events which do not have a logical duration. For example: a notification that a goal has been scored during a live Football match or a boundary between two sections of a live programme. A broadcaster may transmit these events with a zero duration to a client application providing information alongside the video; if the application were to miss these events then it may continue to display outdated information until the next event.

Does your group have any views on how to ensure short duration cues can be reliably delivered to an application?

Any suggestions or comments you're able to give on this subject would be appreciated.

Regards

Jon Piesing
Chair HbbTV Specification Working Group
************************************************************************************************************
FYI Our current proposed mapping for the properties of TextTracks and DataCues is as follows;

TextTrack property to MPD event or InbandEventStream
-----------------------------------------------------------------------------
Kind  = "metadata"
Label = Empty string
Language = Empty string
Id = Empty String
inBandMetadataTrackDispatchType = @schemeIdUri + “U+0020” (SPACE character) + @value
Mode = Hidden

DataCue property to MPD Events / Inband Events
--------------------------------------------------------------------
Id = @id  / Id
startTime = @presentationTime + the time offset of the start of the period from the start of the presentation / presentation_time_delta + the time offset of the start of the segment from the start of the presentation
endTime = The startTime + @duration / The startTime + the event_duration
pauseOnExit = False / False
Onenter = As defined in HTML5
Onexit = As defined in HTML5
data = The string value of the <EventType> element / message_data
text = above as UTF-16 text / message_data as UTF-16 text

Received on Thursday, 19 December 2013 06:48:33 UTC