Re: Updating sourcing in-band text track for MP4 files

On Thu, Sep 12, 2013 at 1:50 AM, Cyril Concolato
<cyril.concolato@telecom-paristech.fr> wrote:
> Hi all,
>
> The current HTML5 spec [1][2] explains how to build text tracks from ISO
> tracks, but only for the case where the ISO track is a timed metadata track
> (metx, mett). First, this does not cover all tracks which can be potentially
> useful in a web page (e.g. 3GPP Timed Text).

Are you expecting browser to implement native 3GPP Timed Text support?
If so, a TextTrackCue sub-interface should be defined.
If not, since it's captions, it would make sense to define a mapping
to WebVTT cue content & cue settings to be able to expose them in
existing interfaces.
At minimum, it should be exposed as @kind=metadata with 3GPP Timed
Text content exposed in .text of whatever we decide to make the
generic interface for such cues (right now, it's TextTrackCue, but we
have the proposed UnparsedCue interface in preparation).


> Also, with the recent MPEG work
> on the carriage of Timed Text for TTML and WebVTT [3], I think the HTML spec
> should be updated (or maybe that text moved to the ISO specification). To my
> knowledge, it is not implemented yet by browsers.

I'd be happy for some of that to move to the ISO specification, in
particular if you want to map all the ISO tracks. However, some
description of what should happen needs to be included in the HTML
spec. Let's work on what that should be.


> In the light of the recent and long (!!) discussions on Text Tracks, I would
> like to propose the following:
> - When possible (as indicated by Eric [5], this is not always possible), all
> ISO tracks, except when the handler type is 'vide', 'auxv', 'soun' or
> 'hint', should be exposed as TextTracks (ie. this covers the 'meta' tracks
> but now also 'subt' (used for TTML) or 'text' (used for WebVTT) tracks, and
> other tracks, see the register at [4])

Can you go through all of these and make a list of the types under
question and where they fit into one of the semantic @kind values that
the HTML spec has? The list at http://mp4ra.org/codecs.html seems huge
and not cover all the types you're mentioning.

Also, a nit-pick: I am confused why WebVTT is regarded as "Textual
meta-data with MIME type" when it's just generally timed-aligned bits
of data?


> - then, if the couple ISO-parser/Browser is capable of producing an
> equivalent WebVTT representation of the text track content (of any @kind,
> possibly metadata) without losing information, the
> @inBandMetadataTrackDispatchType is left empty and the track is populated as
> if it was an out-of-band WebVTT track. This would be used for example when
> WebVTT content is carried in ISO tracks but could be used for other formats
> where the mapping to WebVTT is feasible/simple. Note we could add a similar
> text for TTML once the TTML cues are defined.

Note the above mentioned distinction between the currently proposed
UnparsedCue and VTTCue - this should be taken care of here, too.

So, first you need to check if the format in cues is natively
supported in the browser and use that TextTrackCue sub-interface for
the cues.
(e.g. if TTMLCue is supported in the browser, expose it as TTMLCue)

Only if it's not supported and it's not semantically @kind=metadata,
suggest converting it to WebVTT.


> - and otherwise (if a WebVTT representation cannot be generated or generated
> without loss),
>   - the TextTrack object is populated as follows:
>      - the @kind is set to 'metadata'
>      - the @label is set to the ISO 'track handler name'
>      - the @id is set to the ISO track id
>      - the @inBandMetadataTrackDispatchType contains the base64 encoded
> sample entry box.
>   - and each sample produces a cue built as follows:
>      - the id attribute is empty
>      - the pauseOnExit attribute is set to false
>      - the start and end time of the cue are the start and end time of the
> sample.
>      - the content of the cue contains the sample data. Note: the cue
> content can be in .text (base64 encoded if initially binary) or if the cue
> interface (TextTrackCue, VTTCue or UnParsedCue or whatever the name)
> includes an ArrayBuffer, we should use that.

That makes sense to me with UnparsedCue as the interface.

Cheers,
Silvia.


>
> Comments?
>
> Cyril
>
> [1]
> http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#sourcing-in-band-text-tracks
> [2]
> http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#guidelines-for-exposing-cues-in-various-formats-as-text-track-cues
> [3]
> http://www.w3.org/community/texttracks/2013/09/11/carriage-of-webvtt-and-ttml-in-mp4-files/
> [4] http://mp4ra.org/codecs.html
> [5] http://lists.w3.org/Archives/Public/public-html/2013Sep/0012.html
>
> --
> Cyril Concolato
> Maître de Conférences/Associate Professor
> Groupe Multimedia/Multimedia Group
> Telecom ParisTech
> 46 rue Barrault
> 75 013 Paris, France
> http://concolato.wp.mines-telecom.fr/
>
>

Received on Monday, 16 September 2013 03:26:51 UTC