- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Mon, 16 Sep 2013 13:26:04 +1000
- To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
- Cc: public-html <public-html@w3.org>
On Thu, Sep 12, 2013 at 1:50 AM, Cyril Concolato <cyril.concolato@telecom-paristech.fr> wrote: > Hi all, > > The current HTML5 spec [1][2] explains how to build text tracks from ISO > tracks, but only for the case where the ISO track is a timed metadata track > (metx, mett). First, this does not cover all tracks which can be potentially > useful in a web page (e.g. 3GPP Timed Text). Are you expecting browser to implement native 3GPP Timed Text support? If so, a TextTrackCue sub-interface should be defined. If not, since it's captions, it would make sense to define a mapping to WebVTT cue content & cue settings to be able to expose them in existing interfaces. At minimum, it should be exposed as @kind=metadata with 3GPP Timed Text content exposed in .text of whatever we decide to make the generic interface for such cues (right now, it's TextTrackCue, but we have the proposed UnparsedCue interface in preparation). > Also, with the recent MPEG work > on the carriage of Timed Text for TTML and WebVTT [3], I think the HTML spec > should be updated (or maybe that text moved to the ISO specification). To my > knowledge, it is not implemented yet by browsers. I'd be happy for some of that to move to the ISO specification, in particular if you want to map all the ISO tracks. However, some description of what should happen needs to be included in the HTML spec. Let's work on what that should be. > In the light of the recent and long (!!) discussions on Text Tracks, I would > like to propose the following: > - When possible (as indicated by Eric [5], this is not always possible), all > ISO tracks, except when the handler type is 'vide', 'auxv', 'soun' or > 'hint', should be exposed as TextTracks (ie. this covers the 'meta' tracks > but now also 'subt' (used for TTML) or 'text' (used for WebVTT) tracks, and > other tracks, see the register at [4]) Can you go through all of these and make a list of the types under question and where they fit into one of the semantic @kind values that the HTML spec has? The list at http://mp4ra.org/codecs.html seems huge and not cover all the types you're mentioning. Also, a nit-pick: I am confused why WebVTT is regarded as "Textual meta-data with MIME type" when it's just generally timed-aligned bits of data? > - then, if the couple ISO-parser/Browser is capable of producing an > equivalent WebVTT representation of the text track content (of any @kind, > possibly metadata) without losing information, the > @inBandMetadataTrackDispatchType is left empty and the track is populated as > if it was an out-of-band WebVTT track. This would be used for example when > WebVTT content is carried in ISO tracks but could be used for other formats > where the mapping to WebVTT is feasible/simple. Note we could add a similar > text for TTML once the TTML cues are defined. Note the above mentioned distinction between the currently proposed UnparsedCue and VTTCue - this should be taken care of here, too. So, first you need to check if the format in cues is natively supported in the browser and use that TextTrackCue sub-interface for the cues. (e.g. if TTMLCue is supported in the browser, expose it as TTMLCue) Only if it's not supported and it's not semantically @kind=metadata, suggest converting it to WebVTT. > - and otherwise (if a WebVTT representation cannot be generated or generated > without loss), > - the TextTrack object is populated as follows: > - the @kind is set to 'metadata' > - the @label is set to the ISO 'track handler name' > - the @id is set to the ISO track id > - the @inBandMetadataTrackDispatchType contains the base64 encoded > sample entry box. > - and each sample produces a cue built as follows: > - the id attribute is empty > - the pauseOnExit attribute is set to false > - the start and end time of the cue are the start and end time of the > sample. > - the content of the cue contains the sample data. Note: the cue > content can be in .text (base64 encoded if initially binary) or if the cue > interface (TextTrackCue, VTTCue or UnParsedCue or whatever the name) > includes an ArrayBuffer, we should use that. That makes sense to me with UnparsedCue as the interface. Cheers, Silvia. > > Comments? > > Cyril > > [1] > http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#sourcing-in-band-text-tracks > [2] > http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#guidelines-for-exposing-cues-in-various-formats-as-text-track-cues > [3] > http://www.w3.org/community/texttracks/2013/09/11/carriage-of-webvtt-and-ttml-in-mp4-files/ > [4] http://mp4ra.org/codecs.html > [5] http://lists.w3.org/Archives/Public/public-html/2013Sep/0012.html > > -- > Cyril Concolato > Maître de Conférences/Associate Professor > Groupe Multimedia/Multimedia Group > Telecom ParisTech > 46 rue Barrault > 75 013 Paris, France > http://concolato.wp.mines-telecom.fr/ > >
Received on Monday, 16 September 2013 03:26:51 UTC