Re: Updating sourcing in-band text track for MP4 files from Silvia Pfeiffer on 2013-09-18 (public-html@w3.org from September 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 18 Sep 2013 17:41:22 +1000
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Cc: public-html <public-html@w3.org>
Message-ID: <CAHp8n2kAtt7ZMZFAFhwHiBE5G-TdqqYtx4zCtfh+4b0fFU+1Qg@mail.gmail.com>
On Wed, Sep 18, 2013 at 12:21 AM, Cyril Concolato
<cyril.concolato@telecom-paristech.fr> wrote:
> Hi Silvia,
>
> Le 16/09/2013 05:26, Silvia Pfeiffer a écrit :
>
>> On Thu, Sep 12, 2013 at 1:50 AM, Cyril Concolato
>> <cyril.concolato@telecom-paristech.fr>  wrote:
>>
>> Can you go through all of these and make a list of the types under
>> question and where they fit into one of the semantic @kind values that
>> the HTML spec has? The list athttp://mp4ra.org/codecs.html  seems huge
>>
>> and not cover all the types you're mentioning.
>
> It's not so big once you remove Audio/Video/Hint handler types, the
> remaining stream types would be:
> - ISO stuff: Text timed metadata, XML timed metadata, URI identified
> metadata, MPEG-4 Systems streams, SVC metadata, text streams
> - DVB stuff: Track Level Index Track, Movie level index track,
> - 3GPP/OMA: 3GPP Timed Text, OMA Keys,
> - DECE Sub-titles (Timed Text),
> - Apple 32/64 bit timecode samples

Sorry if this seems obvious to you, but which of these are covered by
TextTrack @kind values?
I.e. which of these are captions / subtitles and which are something
else (i.e. "metadata")?


>> Also, a nit-pick: I am confused why WebVTT is regarded as "Textual
>> meta-data with MIME type" when it's just generally timed-aligned bits
>> of data?
>
> The ISO spec is a quite confusing here and maybe the MP4RA site too. There
> are 2 parameters to consider:
> - the *handler type* (3rd column in the MP4RA site) that classifies the
> content in large categories, to inform the player about the broad
> capabilities it needs to have to process the stream, and which can have the
> following 4CC values (i.e. ability to process) : 'soun' (sound), 'vide'
> (video), 'subt' (subtitles potentially with images), 'text' (subtitles
> without images), 'hint' (transport protocol packets) or 'meta' (metadata).
> - and the stream type (or *sample entry type*, 1st column) also identified
> by a 4CC.
>
> Unfortunately, there is some overlap in the handler types between 'subt',
> 'meta' and 'text'. I lost the battle proposing to harmonize them. So here
> are some examples of interest (using <handler type>/<stream
> type>/<additional parameters when the stream type is too generic>):
> - WebVTT is identified as 'text'/'wvtt'
> - TTML is identified as 'subt'/'stpp'
> - 3GPP Timed Text is identified as 'text'/'tx3g'
> - a generic XML metadata stream would be: 'meta'/'metx'/<namespace>
> - a generic text metadata stream would be:  'meta'/'mett'/<mime format>
>
> As for the one you mention "Textual meta-data with MIME type" it is
> identified as 'meta'/'text'/<mime format> and I can't find what it is used
> for...

Thanks for this. This should be useful to identify semantics.


>>> - then, if the couple ISO-parser/Browser is capable of producing an
>>> equivalent WebVTT representation of the text track content (of any @kind,
>>> possibly metadata) without losing information, the
>>> @inBandMetadataTrackDispatchType is left empty and the track is populated
>>> as
>>> if it was an out-of-band WebVTT track. This would be used for example
>>> when
>>> WebVTT content is carried in ISO tracks but could be used for other
>>> formats
>>> where the mapping to WebVTT is feasible/simple. Note we could add a
>>> similar
>>> text for TTML once the TTML cues are defined.
>>
>> Note the above mentioned distinction between the currently proposed
>> UnparsedCue and VTTCue - this should be taken care of here, too.
>>
>> So, first you need to check if the format in cues is natively
>> supported in the browser and use that TextTrackCue sub-interface for
>> the cues.
>> (e.g. if TTMLCue is supported in the browser, expose it as TTMLCue)
>>
>> Only if it's not supported and it's not semantically @kind=metadata,
>> suggest converting it to WebVTT.
>
> Agree.
>
>
>>
>>
>>> - and otherwise (if a WebVTT representation cannot be generated or
>>> generated
>>> without loss),
>>>    - the TextTrack object is populated as follows:
>>>       - the @kind is set to 'metadata'
>>>       - the @label is set to the ISO 'track handler name'
>>>       - the @id is set to the ISO track id
>>>       - the @inBandMetadataTrackDispatchType contains the base64 encoded
>>> sample entry box.
>>>    - and each sample produces a cue built as follows:
>>>       - the id attribute is empty
>>>       - the pauseOnExit attribute is set to false
>>>       - the start and end time of the cue are the start and end time of
>>> the
>>> sample.
>>>       - the content of the cue contains the sample data. Note: the cue
>>> content can be in .text (base64 encoded if initially binary) or if the
>>> cue
>>> interface (TextTrackCue, VTTCue or UnParsedCue or whatever the name)
>>> includes an ArrayBuffer, we should use that.
>>
>> That makes sense to me with UnparsedCue as the interface.
>
> Ok, I'll make sure this is integrated when the interface finally shows up.

Good. Glad to hear we're on the same page now.

Silvia.
Received on Wednesday, 18 September 2013 07:42:10 UTC