Updating sourcing in-band text track for MP4 files

Hi all,

The current HTML5 spec [1][2] explains how to build text tracks from ISO 
tracks, but only for the case where the ISO track is a timed metadata 
track (metx, mett). First, this does not cover all tracks which can be 
potentially useful in a web page (e.g. 3GPP Timed Text). Also, with the 
recent MPEG work on the carriage of Timed Text for TTML and WebVTT [3], 
I think the HTML spec should be updated (or maybe that text moved to the 
ISO specification). To my knowledge, it is not implemented yet by browsers.

In the light of the recent and long (!!) discussions on Text Tracks, I 
would like to propose the following:
- When possible (as indicated by Eric [5], this is not always possible), 
all ISO tracks, except when the handler type is 'vide', 'auxv', 'soun' 
or 'hint', should be exposed as TextTracks (ie. this covers the 'meta' 
tracks but now also 'subt' (used for TTML) or 'text' (used for WebVTT) 
tracks, and other tracks, see the register at [4])
- then, if the couple ISO-parser/Browser is capable of producing an 
equivalent WebVTT representation of the text track content (of any 
@kind, possibly metadata) without losing information, the 
@inBandMetadataTrackDispatchType is left empty and the track is 
populated as if it was an out-of-band WebVTT track. This would be used 
for example when WebVTT content is carried in ISO tracks but could be 
used for other formats where the mapping to WebVTT is feasible/simple. 
Note we could add a similar text for TTML once the TTML cues are defined.
- and otherwise (if a WebVTT representation cannot be generated or 
generated without loss),
   - the TextTrack object is populated as follows:
      - the @kind is set to 'metadata'
      - the @label is set to the ISO 'track handler name'
      - the @id is set to the ISO track id
      - the @inBandMetadataTrackDispatchType contains the base64 encoded 
sample entry box.
   - and each sample produces a cue built as follows:
      - the id attribute is empty
      - the pauseOnExit attribute is set to false
      - the start and end time of the cue are the start and end time of 
the sample.
      - the content of the cue contains the sample data. Note: the cue 
content can be in .text (base64 encoded if initially binary) or if the 
cue interface (TextTrackCue, VTTCue or UnParsedCue or whatever the name) 
includes an ArrayBuffer, we should use that.

Comments?

Cyril

[1] 
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#sourcing-in-band-text-tracks
[2] 
http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#guidelines-for-exposing-cues-in-various-formats-as-text-track-cues
[3] 
http://www.w3.org/community/texttracks/2013/09/11/carriage-of-webvtt-and-ttml-in-mp4-files/
[4] http://mp4ra.org/codecs.html
[5] http://lists.w3.org/Archives/Public/public-html/2013Sep/0012.html

-- 
Cyril Concolato
Maître de Conférences/Associate Professor
Groupe Multimedia/Multimedia Group
Telecom ParisTech
46 rue Barrault
75 013 Paris, France
http://concolato.wp.mines-telecom.fr/

Received on Wednesday, 11 September 2013 15:51:00 UTC