RE: HTML WG Note publication of sourcing in-band media resources

The Chairs discussed this situation this morning in great depth and are drafting a response on this matter.  We hope to provide it before the end of this week.

/paulc

Paul Cotton, Microsoft Canada
17 Eleanor Drive, Ottawa, Ontario K2E 6A3
Tel: (425) 705-9596 Fax: (425) 936-7329


-----Original Message-----
From: Bob Lund [mailto:B.Lund@CableLabs.com] 
Sent: Tuesday, June 03, 2014 5:49 PM
To: Paul Cotton; public-html-admin@w3.org
Cc: Pierre-Anthony Lemieux; Kilroy Hughes; Philip Jägenstedt; Jerry Smith (WINDOWS); Silvia Pfeiffer
Subject: Re: HTML WG Note publication of sourcing in-band media resources

Chairs and HTML Admin Group,

I was wondering if you¹ve concluded your evaluation of my response.

Thanks,
Bob Lund

On 5/27/14, 6:26 PM, "Paul Cotton" <Paul.Cotton@microsoft.com> wrote:

>From Sylvia's response to this week's WG Weekly agenda:
>http://lists.w3.org/Archives/Public/public-html-wg-announce/2014AprJun/
>001
>6.html
>
>>> 7. Any other business
>>>
>>> a) HTML extension spec for sourcing in-band tracks 
>>> http://lists.w3.org/Archives/Public/public-html-admin/2014May/0030.h
>>> tml
>>
>>The particular question I have for this is: how are we going to get it 
>>published under a w3.org URL?
>>
>>We have contributed the spec to the W3C github account at 
>>https://github.com/w3c/HTMLSourcingInbandTracks
>>so it is available at
>>http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html
>
>The Chairs and Team provided our initial response in:
>http://lists.w3.org/Archives/Public/public-html-admin/2014May/0030.html
>in which we recommended publishing this material using the model 
>adopted for the "Media Source Extensions Byte Stream Format Registry".
>
>The Chairs and Team are now evaluating Bob's response to our initial
>response:
>http://lists.w3.org/Archives/Public/public-html-admin/2014May/0034.html
>and the fact that the Media TF have an open  bug on how to update such 
>a
>Registry:
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=25581
>
>Unfortunately due to other commitments the Chairs and Team have not 
>been able to discuss this matter since Bob sent us his response. We 
>hope to do this no later than by early next week.
>
>/paulc
>HTML WG co-chair
>
>Paul Cotton, Microsoft Canada
>17 Eleanor Drive, Ottawa, Ontario K2E 6A3
>Tel: (425) 705-9596 Fax: (425) 936-7329
>
>
>-----Original Message-----
>From: Kilroy Hughes
>Sent: Monday, May 19, 2014 12:02 PM
>To: Silvia Pfeiffer; Philip Jägenstedt
>Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton; 
>public-html-admin@w3.org; Pierre-Anthony Lemieux
>Subject: RE: HTML WG Note publication of sourcing in-band media 
>resources
>
>The ISO Base Media File Format Part 30 (ISO/IEC 14496-30) defines 
>subtitle tracks (which are inclusive of captions, SDH, description, 
>translation, graphics such as glyphs and signing, etc.).
>
>It doesn't say anything about Kinds, or have a similar field in the 
>standard track header and sample description.
>
>Both TTML and WebVTT storage are defined.
>I know TTML has generic metadata tags, but not a specific method of 
>identifying presentation objects such as <p> and <div> according to 
>Kind; nor any standardized concept of sub-track.
>You would know better if WebVTT content and readers conform to a 
>sub-track or Kind tagging method corresponding to two HTML tracks in 
>the same text file/track.
>
>In the case of DASH streaming timed text and graphics subtitles 
>(ISO/IEC
>23009-1) stored as Part 30 movie fragments, the manifest (Media 
>Presentation Description, MPD) may include optional Role Descriptor 
>elements that are intended to function like Kind to descript Adaptation 
>Sets that result in tracks when streamed in an HTML5 browser using MSE.
>The DASH standard was completed previous to W3C Kind specification, so 
>defines a slightly different vocabulary than that eventually settled on 
>by W3C.  It also allows multiple Role Descriptors because an Adaptation 
>Set (track) may fit multiple descriptions, such as "Main" or "Alternate"
>and "Description" or "Translation".  The Role descriptor uses a URI/URN 
>to identify the vocabulary and syntax contained in the descriptor, so 
>it is extensible beyond the vocabulary defined in the DASH standard.
>
>An addition Accessibility Descriptor is specified in the DASH MPD 
>schema to allow automatic selection of audio, video, and TTML tracks 
>for users with visual, hearing, cognitive, etc. impairments.  A URI/URN 
>can be selected that labels these tracks with identifiers established 
>by regulation, broadcast TV, etc., such as "SDH" for Subtitles for Deaf 
>and Hard of hearing.  Even if a player does not recognize the 
>particular URI/URN or descriptive term used in this Descriptor, it can 
>make a default selection when a user preference setting indicates an 
>impairment, based on the presence of the Accessibility Descriptor, 
>language attribute, etc.  It may also have a Descriptor indicating 
>"alternate" or similar, but that would not be very useful for someone 
>who is visually impaired or a standard player that would like to find 
>an audio description track.
>
>Selection of an Adaptation Set and a Representation contained in it for 
>adaptive streaming involves evaluating attributes that identify codec, 
>video resolution or audio track configuration, language, frame rate, 
>bitrate, etc. in addition to the Role or Kind. An Adaptation Set 
>contains perceptually equivalent content, but possibly multiple 
>Representations that are encoded differently to enable rapid switching 
>to compensate for variation in network throughput.  The intent is that 
>Media Segments adaptively selected and sequenced from different 
>Representations within an Adaptation Set will appear to be a continuous 
>track on playback, so they share the same Role Descriptor.  Although it 
>is possible, it is unlikely that a Subtitle Adaptation Set will contain 
>more than one Representation.
>
>A single AdaptationSet element (track) may by described by e.g. one 
>Accessibility Descriptor and two Role Descriptors indicating a TTML 
>track was character coded Hiragana for children and blind readers of 
>touch devices, and was descriptive, so also suitable for hearing 
>impaired Japanese.  An alternative AdaptationSet (track) could be 
>described by both Accessibility and Role descriptors to describe 
>painted Kanji glyphs, more appropriate for adult hearing impaired 
>readers, and more typical of the majority of the world's cursive 
>writing systems and subtitles used on movies, video discs, and broadcast.
>
>Although there can be multiple descriptions of a track, there isn't 
>provision for multiple "sub-tracks" within a single TTML (or WebVTT?) 
>Adaptation Set or ISO Media track.
>
>There is one special case to consider, which is binary captions 
>encapsulated in AVC/HEVC elementary streams.  A video track will act 
>like two tracks when broadcast content containing e.g. CEA-608 or 
>CEA-708 or Teletext, etc. is played on a device with the appropriate 
>caption decoder(s).  These include iOS devices, game consoles, settop 
>boxes, TVs, etc.  It would be useful to identify if these broadcast 
>captions are present and turn them on/off; but that may be in the scope 
>of W3C groups working on tuner APIs, etc.
>
>Kilroy Hughes | Senior Digital Media Architect |Windows Azure Media 
>Services | Microsoft Corporation
>
>
>-----Original Message-----
>From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
>Sent: Monday, May 19, 2014 5:12 AM
>To: Philip Jägenstedt
>Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton; 
>public-html-admin@w3.org; Pierre-Anthony Lemieux
>Subject: Re: HTML WG Note publication of sourcing in-band media 
>resources
>
>On Mon, May 19, 2014 at 10:02 PM, Philip Jägenstedt <philipj@opera.com>
>wrote:
>> On Mon, May 19, 2014 at 1:29 PM, Silvia Pfeiffer 
>> <silviapfeiffer1@gmail.com> wrote:
>>> On Mon, May 19, 2014 at 7:22 PM, Philip Jägenstedt 
>>><philipj@opera.com>
>>>wrote:
>>
>>>> Finally, does ISO BMFF have SDH (subtitles for the deaf or
>>>> hard-of-hearing) as a separate flag from the subtitle and captions 
>>>> kinds, or is possible to assign an arbitrary number of kinds to a 
>>>> track? Either way it doesn't sound like it maps 1:1 to the HTML 
>>>> track kinds.
>>>
>>> That's what I tried to say: since the ISO BMFF 'SDH' track contains 
>>> both 'SDH' and 'subtitles' cues, it should be mapped to both a 
>>> @kind='captions' track and also a @kind='subtitles' track where the 
>>> cues that are marked to be for SDH only are removed.
>>
>> Are the individual cues really marked with that metadata? If they 
>> aren't, then exposing such a single track with kind 'captions' seems 
>> like the correct mapping.
>
>I was under that impression, but I haven't been able to confirm this.
>Maybe somebody else with actual MPEG4 specs can confirm / refute that 
>assumption?
>
>Cheers,
>Silvia.
>

Received on Tuesday, 3 June 2014 21:50:21 UTC