Re: HTML WG Note publication of sourcing in-band media resources

On Tue, May 20, 2014 at 2:02 AM, Kilroy Hughes
<> wrote:
> The ISO Base Media File Format Part 30 (ISO/IEC 14496-30) defines subtitle tracks (which are inclusive of captions, SDH, description, translation, graphics such as glyphs and signing, etc.).
> It doesn't say anything about Kinds, or have a similar field in the standard track header and sample description.
> Both TTML and WebVTT storage are defined.
> I know TTML has generic metadata tags, but not a specific method of identifying presentation objects such as <p> and <div> according to Kind; nor any standardized concept of sub-track.
> You would know better if WebVTT content and readers conform to a sub-track or Kind tagging method corresponding to two HTML tracks in the same text file/track.
> In the case of DASH streaming timed text and graphics subtitles (ISO/IEC 23009-1) stored as Part 30 movie fragments, the manifest (Media Presentation Description, MPD) may include optional Role Descriptor elements that are intended to function like Kind to descript Adaptation Sets that result in tracks when streamed in an HTML5 browser using MSE.  The DASH standard was completed previous to W3C Kind specification, so defines a slightly different vocabulary than that eventually settled on by W3C.  It also allows multiple Role Descriptors because an Adaptation Set (track) may fit multiple descriptions, such as "Main" or "Alternate" and "Description" or "Translation".  The Role descriptor uses a URI/URN to identify the vocabulary and syntax contained in the descriptor, so it is extensible beyond the vocabulary defined in the DASH standard.
> An addition Accessibility Descriptor is specified in the DASH MPD schema to allow automatic selection of audio, video, and TTML tracks for users with visual, hearing, cognitive, etc. impairments.  A URI/URN can be selected that labels these tracks with identifiers established by regulation, broadcast TV, etc., such as "SDH" for Subtitles for Deaf and Hard of hearing.  Even if a player does not recognize the particular URI/URN or descriptive term used in this Descriptor, it can make a default selection when a user preference setting indicates an impairment, based on the presence of the Accessibility Descriptor, language attribute, etc.  It may also have a Descriptor indicating "alternate" or similar, but that would not be very useful for someone who is visually impaired or a standard player that would like to find an audio description track.
> Selection of an Adaptation Set and a Representation contained in it for adaptive streaming involves evaluating attributes that identify codec, video resolution or audio track configuration, language, frame rate, bitrate, etc. in addition to the Role or Kind. An Adaptation Set contains perceptually equivalent content, but possibly multiple Representations that are encoded differently to enable rapid switching to compensate for variation in network throughput.  The intent is that Media Segments adaptively selected and sequenced from different Representations within an Adaptation Set will appear to be a continuous track on playback, so they share the same Role Descriptor.  Although it is possible, it is unlikely that a Subtitle Adaptation Set will contain more than one Representation.
> A single AdaptationSet element (track) may by described by e.g. one Accessibility Descriptor and two Role Descriptors indicating a TTML track was character coded Hiragana for children and blind readers of touch devices, and was descriptive, so also suitable for hearing impaired Japanese.  An alternative AdaptationSet (track) could be described by both Accessibility and Role descriptors to describe painted Kanji glyphs, more appropriate for adult hearing impaired readers, and more typical of the majority of the world's cursive writing systems and subtitles used on movies, video discs, and broadcast.
> Although there can be multiple descriptions of a track, there isn't provision for multiple "sub-tracks" within a single TTML (or WebVTT?) Adaptation Set or ISO Media track.
> There is one special case to consider, which is binary captions encapsulated in AVC/HEVC elementary streams.  A video track will act like two tracks when broadcast content containing e.g. CEA-608 or CEA-708 or Teletext, etc. is played on a device with the appropriate caption decoder(s).  These include iOS devices, game consoles, settop boxes, TVs, etc.  It would be useful to identify if these broadcast captions are present and turn them on/off; but that may be in the scope of W3C groups working on tuner APIs, etc.
now includes several notes on what different in-band tracks could be
encountered and how they are to be exposed as multiple tracks in HTML.



> Kilroy Hughes | Senior Digital Media Architect |Windows Azure Media Services | Microsoft Corporation
> -----Original Message-----
> From: Silvia Pfeiffer []
> Sent: Monday, May 19, 2014 5:12 AM
> To: Philip Jägenstedt
> Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton;; Pierre-Anthony Lemieux
> Subject: Re: HTML WG Note publication of sourcing in-band media resources
> On Mon, May 19, 2014 at 10:02 PM, Philip Jägenstedt <> wrote:
>> On Mon, May 19, 2014 at 1:29 PM, Silvia Pfeiffer
>> <> wrote:
>>> On Mon, May 19, 2014 at 7:22 PM, Philip Jägenstedt <> wrote:
>>>> Finally, does ISO BMFF have SDH (subtitles for the deaf or
>>>> hard-of-hearing) as a separate flag from the subtitle and captions
>>>> kinds, or is possible to assign an arbitrary number of kinds to a
>>>> track? Either way it doesn't sound like it maps 1:1 to the HTML
>>>> track kinds.
>>> That's what I tried to say: since the ISO BMFF 'SDH' track contains
>>> both 'SDH' and 'subtitles' cues, it should be mapped to both a
>>> @kind='captions' track and also a @kind='subtitles' track where the
>>> cues that are marked to be for SDH only are removed.
>> Are the individual cues really marked with that metadata? If they
>> aren't, then exposing such a single track with kind 'captions' seems
>> like the correct mapping.
> I was under that impression, but I haven't been able to confirm this.
> Maybe somebody else with actual MPEG4 specs can confirm / refute that assumption?
> Cheers,
> Silvia.

Received on Monday, 26 May 2014 13:30:41 UTC