RE: HTML WG Note publication of sourcing in-band media resources from Paul Cotton on 2014-05-28 (public-html-admin@w3.org from May 2014)

From: Paul Cotton <Paul.Cotton@microsoft.com>
Date: Wed, 28 May 2014 00:26:43 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Bob Lund <B.Lund@cablelabs.com>
CC: "public-html-admin@w3.org" <public-html-admin@w3.org>, "Pierre-Anthony Lemieux" <pal@sandflow.com>, Kilroy Hughes <Kilroy.Hughes@microsoft.com>, Philip Jägenstedt <philipj@opera.com>, "Jerry Smith (WINDOWS)" <jdsmith@microsoft.com>
Message-ID: <1c2573432f2f4d27866f9404169601fe@BL2PR03MB418.namprd03.prod.outlook.com>
From Sylvia's response to this week's WG Weekly agenda:
http://lists.w3.org/Archives/Public/public-html-wg-announce/2014AprJun/0016.html


>> 7. Any other business
>>
>> a) HTML extension spec for sourcing in-band tracks 
>> http://lists.w3.org/Archives/Public/public-html-admin/2014May/0030.html

>
>The particular question I have for this is: how are we going to get it published under a w3.org URL?
>
>We have contributed the spec to the W3C github account at https://github.com/w3c/HTMLSourcingInbandTracks

>so it is available at
>http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html


The Chairs and Team provided our initial response in:
http://lists.w3.org/Archives/Public/public-html-admin/2014May/0030.html 
in which we recommended publishing this material using the model adopted for the "Media Source Extensions Byte Stream Format Registry".  

The Chairs and Team are now evaluating Bob's response to our initial response:
http://lists.w3.org/Archives/Public/public-html-admin/2014May/0034.html

and the fact that the Media TF have an open  bug on how to update such a Registry:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25581 

Unfortunately due to other commitments the Chairs and Team have not been able to discuss this matter since Bob sent us his response. We hope to do this no later than by early next week.

/paulc
HTML WG co-chair

Paul Cotton, Microsoft Canada
17 Eleanor Drive, Ottawa, Ontario K2E 6A3
Tel: (425) 705-9596 Fax: (425) 936-7329


-----Original Message-----
From: Kilroy Hughes 
Sent: Monday, May 19, 2014 12:02 PM
To: Silvia Pfeiffer; Philip Jägenstedt
Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton; public-html-admin@w3.org; Pierre-Anthony Lemieux
Subject: RE: HTML WG Note publication of sourcing in-band media resources

The ISO Base Media File Format Part 30 (ISO/IEC 14496-30) defines subtitle tracks (which are inclusive of captions, SDH, description, translation, graphics such as glyphs and signing, etc.).

It doesn't say anything about Kinds, or have a similar field in the standard track header and sample description.  

Both TTML and WebVTT storage are defined.  
I know TTML has generic metadata tags, but not a specific method of identifying presentation objects such as <p> and <div> according to Kind; nor any standardized concept of sub-track.
You would know better if WebVTT content and readers conform to a sub-track or Kind tagging method corresponding to two HTML tracks in the same text file/track. 

In the case of DASH streaming timed text and graphics subtitles (ISO/IEC 23009-1) stored as Part 30 movie fragments, the manifest (Media Presentation Description, MPD) may include optional Role Descriptor elements that are intended to function like Kind to descript Adaptation Sets that result in tracks when streamed in an HTML5 browser using MSE.  The DASH standard was completed previous to W3C Kind specification, so defines a slightly different vocabulary than that eventually settled on by W3C.  It also allows multiple Role Descriptors because an Adaptation Set (track) may fit multiple descriptions, such as "Main" or "Alternate" and "Description" or "Translation".  The Role descriptor uses a URI/URN to identify the vocabulary and syntax contained in the descriptor, so it is extensible beyond the vocabulary defined in the DASH standard.  

An addition Accessibility Descriptor is specified in the DASH MPD schema to allow automatic selection of audio, video, and TTML tracks for users with visual, hearing, cognitive, etc. impairments.  A URI/URN can be selected that labels these tracks with identifiers established by regulation, broadcast TV, etc., such as "SDH" for Subtitles for Deaf and Hard of hearing.  Even if a player does not recognize the particular URI/URN or descriptive term used in this Descriptor, it can make a default selection when a user preference setting indicates an impairment, based on the presence of the Accessibility Descriptor, language attribute, etc.  It may also have a Descriptor indicating "alternate" or similar, but that would not be very useful for someone who is visually impaired or a standard player that would like to find an audio description track.

Selection of an Adaptation Set and a Representation contained in it for adaptive streaming involves evaluating attributes that identify codec, video resolution or audio track configuration, language, frame rate, bitrate, etc. in addition to the Role or Kind. An Adaptation Set contains perceptually equivalent content, but possibly multiple Representations that are encoded differently to enable rapid switching to compensate for variation in network throughput.  The intent is that Media Segments adaptively selected and sequenced from different Representations within an Adaptation Set will appear to be a continuous track on playback, so they share the same Role Descriptor.  Although it is possible, it is unlikely that a Subtitle Adaptation Set will contain more than one Representation.

A single AdaptationSet element (track) may by described by e.g. one Accessibility Descriptor and two Role Descriptors indicating a TTML track was character coded Hiragana for children and blind readers of touch devices, and was descriptive, so also suitable for hearing impaired Japanese.  An alternative AdaptationSet (track) could be described by both Accessibility and Role descriptors to describe painted Kanji glyphs, more appropriate for adult hearing impaired readers, and more typical of the majority of the world's cursive writing systems and subtitles used on movies, video discs, and broadcast. 

Although there can be multiple descriptions of a track, there isn't provision for multiple "sub-tracks" within a single TTML (or WebVTT?) Adaptation Set or ISO Media track.  

There is one special case to consider, which is binary captions encapsulated in AVC/HEVC elementary streams.  A video track will act like two tracks when broadcast content containing e.g. CEA-608 or CEA-708 or Teletext, etc. is played on a device with the appropriate caption decoder(s).  These include iOS devices, game consoles, settop boxes, TVs, etc.  It would be useful to identify if these broadcast captions are present and turn them on/off; but that may be in the scope of W3C groups working on tuner APIs, etc. 

Kilroy Hughes | Senior Digital Media Architect |Windows Azure Media Services | Microsoft Corporation 


-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
Sent: Monday, May 19, 2014 5:12 AM
To: Philip Jägenstedt
Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton; public-html-admin@w3.org; Pierre-Anthony Lemieux
Subject: Re: HTML WG Note publication of sourcing in-band media resources

On Mon, May 19, 2014 at 10:02 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Mon, May 19, 2014 at 1:29 PM, Silvia Pfeiffer 
> <silviapfeiffer1@gmail.com> wrote:
>> On Mon, May 19, 2014 at 7:22 PM, Philip Jägenstedt <philipj@opera.com> wrote:
>
>>> Finally, does ISO BMFF have SDH (subtitles for the deaf or
>>> hard-of-hearing) as a separate flag from the subtitle and captions 
>>> kinds, or is possible to assign an arbitrary number of kinds to a 
>>> track? Either way it doesn't sound like it maps 1:1 to the HTML 
>>> track kinds.
>>
>> That's what I tried to say: since the ISO BMFF 'SDH' track contains 
>> both 'SDH' and 'subtitles' cues, it should be mapped to both a 
>> @kind='captions' track and also a @kind='subtitles' track where the 
>> cues that are marked to be for SDH only are removed.
>
> Are the individual cues really marked with that metadata? If they 
> aren't, then exposing such a single track with kind 'captions' seems 
> like the correct mapping.

I was under that impression, but I haven't been able to confirm this.
Maybe somebody else with actual MPEG4 specs can confirm / refute that assumption?

Cheers,
Silvia.
Received on Wednesday, 28 May 2014 00:27:33 UTC