Re: HTML WG Note publication of sourcing in-band media resources from Bob Lund on 2014-06-03 (public-html-admin@w3.org from June 2014)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Tue, 3 Jun 2014 21:48:37 +0000
To: Paul Cotton <Paul.Cotton@microsoft.com>, "public-html-admin@w3.org" <public-html-admin@w3.org>
CC: Pierre-Anthony Lemieux <pal@sandflow.com>, Kilroy Hughes <Kilroy.Hughes@microsoft.com>, Philip Jägenstedt <philipj@opera.com>, "Jerry Smith (WINDOWS)" <jdsmith@microsoft.com>, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
Message-ID: <CFB39E1F.40B58%b.lund@cablelabs.com>
Chairs and HTML Admin Group,

I was wondering if you¹ve concluded your evaluation of my response.

Thanks,
Bob Lund

On 5/27/14, 6:26 PM, "Paul Cotton" <Paul.Cotton@microsoft.com> wrote:

>From Sylvia's response to this week's WG Weekly agenda:
>http://lists.w3.org/Archives/Public/public-html-wg-announce/2014AprJun/001
>6.html
>
>>> 7. Any other business
>>>
>>> a) HTML extension spec for sourcing in-band tracks
>>> http://lists.w3.org/Archives/Public/public-html-admin/2014May/0030.html
>>
>>The particular question I have for this is: how are we going to get it
>>published under a w3.org URL?
>>
>>We have contributed the spec to the W3C github account at
>>https://github.com/w3c/HTMLSourcingInbandTracks
>>so it is available at
>>http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html
>
>The Chairs and Team provided our initial response in:
>http://lists.w3.org/Archives/Public/public-html-admin/2014May/0030.html
>in which we recommended publishing this material using the model adopted
>for the "Media Source Extensions Byte Stream Format Registry".
>
>The Chairs and Team are now evaluating Bob's response to our initial
>response:
>http://lists.w3.org/Archives/Public/public-html-admin/2014May/0034.html
>and the fact that the Media TF have an open  bug on how to update such a
>Registry:
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=25581
>
>Unfortunately due to other commitments the Chairs and Team have not been
>able to discuss this matter since Bob sent us his response. We hope to do
>this no later than by early next week.
>
>/paulc
>HTML WG co-chair
>
>Paul Cotton, Microsoft Canada
>17 Eleanor Drive, Ottawa, Ontario K2E 6A3
>Tel: (425) 705-9596 Fax: (425) 936-7329
>
>
>-----Original Message-----
>From: Kilroy Hughes
>Sent: Monday, May 19, 2014 12:02 PM
>To: Silvia Pfeiffer; Philip Jägenstedt
>Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton;
>public-html-admin@w3.org; Pierre-Anthony Lemieux
>Subject: RE: HTML WG Note publication of sourcing in-band media resources
>
>The ISO Base Media File Format Part 30 (ISO/IEC 14496-30) defines
>subtitle tracks (which are inclusive of captions, SDH, description,
>translation, graphics such as glyphs and signing, etc.).
>
>It doesn't say anything about Kinds, or have a similar field in the
>standard track header and sample description.
>
>Both TTML and WebVTT storage are defined.
>I know TTML has generic metadata tags, but not a specific method of
>identifying presentation objects such as <p> and <div> according to Kind;
>nor any standardized concept of sub-track.
>You would know better if WebVTT content and readers conform to a
>sub-track or Kind tagging method corresponding to two HTML tracks in the
>same text file/track.
>
>In the case of DASH streaming timed text and graphics subtitles (ISO/IEC
>23009-1) stored as Part 30 movie fragments, the manifest (Media
>Presentation Description, MPD) may include optional Role Descriptor
>elements that are intended to function like Kind to descript Adaptation
>Sets that result in tracks when streamed in an HTML5 browser using MSE.
>The DASH standard was completed previous to W3C Kind specification, so
>defines a slightly different vocabulary than that eventually settled on
>by W3C.  It also allows multiple Role Descriptors because an Adaptation
>Set (track) may fit multiple descriptions, such as "Main" or "Alternate"
>and "Description" or "Translation".  The Role descriptor uses a URI/URN
>to identify the vocabulary and syntax contained in the descriptor, so it
>is extensible beyond the vocabulary defined in the DASH standard.
>
>An addition Accessibility Descriptor is specified in the DASH MPD schema
>to allow automatic selection of audio, video, and TTML tracks for users
>with visual, hearing, cognitive, etc. impairments.  A URI/URN can be
>selected that labels these tracks with identifiers established by
>regulation, broadcast TV, etc., such as "SDH" for Subtitles for Deaf and
>Hard of hearing.  Even if a player does not recognize the particular
>URI/URN or descriptive term used in this Descriptor, it can make a
>default selection when a user preference setting indicates an impairment,
>based on the presence of the Accessibility Descriptor, language
>attribute, etc.  It may also have a Descriptor indicating "alternate" or
>similar, but that would not be very useful for someone who is visually
>impaired or a standard player that would like to find an audio
>description track.
>
>Selection of an Adaptation Set and a Representation contained in it for
>adaptive streaming involves evaluating attributes that identify codec,
>video resolution or audio track configuration, language, frame rate,
>bitrate, etc. in addition to the Role or Kind. An Adaptation Set contains
>perceptually equivalent content, but possibly multiple Representations
>that are encoded differently to enable rapid switching to compensate for
>variation in network throughput.  The intent is that Media Segments
>adaptively selected and sequenced from different Representations within
>an Adaptation Set will appear to be a continuous track on playback, so
>they share the same Role Descriptor.  Although it is possible, it is
>unlikely that a Subtitle Adaptation Set will contain more than one
>Representation.
>
>A single AdaptationSet element (track) may by described by e.g. one
>Accessibility Descriptor and two Role Descriptors indicating a TTML track
>was character coded Hiragana for children and blind readers of touch
>devices, and was descriptive, so also suitable for hearing impaired
>Japanese.  An alternative AdaptationSet (track) could be described by
>both Accessibility and Role descriptors to describe painted Kanji glyphs,
>more appropriate for adult hearing impaired readers, and more typical of
>the majority of the world's cursive writing systems and subtitles used on
>movies, video discs, and broadcast.
>
>Although there can be multiple descriptions of a track, there isn't
>provision for multiple "sub-tracks" within a single TTML (or WebVTT?)
>Adaptation Set or ISO Media track.
>
>There is one special case to consider, which is binary captions
>encapsulated in AVC/HEVC elementary streams.  A video track will act like
>two tracks when broadcast content containing e.g. CEA-608 or CEA-708 or
>Teletext, etc. is played on a device with the appropriate caption
>decoder(s).  These include iOS devices, game consoles, settop boxes, TVs,
>etc.  It would be useful to identify if these broadcast captions are
>present and turn them on/off; but that may be in the scope of W3C groups
>working on tuner APIs, etc.
>
>Kilroy Hughes | Senior Digital Media Architect |Windows Azure Media
>Services | Microsoft Corporation
>
>
>-----Original Message-----
>From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
>Sent: Monday, May 19, 2014 5:12 AM
>To: Philip Jägenstedt
>Cc: Jerry Smith (WINDOWS); Bob Lund; Paul Cotton;
>public-html-admin@w3.org; Pierre-Anthony Lemieux
>Subject: Re: HTML WG Note publication of sourcing in-band media resources
>
>On Mon, May 19, 2014 at 10:02 PM, Philip Jägenstedt <philipj@opera.com>
>wrote:
>> On Mon, May 19, 2014 at 1:29 PM, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>>> On Mon, May 19, 2014 at 7:22 PM, Philip Jägenstedt <philipj@opera.com>
>>>wrote:
>>
>>>> Finally, does ISO BMFF have SDH (subtitles for the deaf or
>>>> hard-of-hearing) as a separate flag from the subtitle and captions
>>>> kinds, or is possible to assign an arbitrary number of kinds to a
>>>> track? Either way it doesn't sound like it maps 1:1 to the HTML
>>>> track kinds.
>>>
>>> That's what I tried to say: since the ISO BMFF 'SDH' track contains
>>> both 'SDH' and 'subtitles' cues, it should be mapped to both a
>>> @kind='captions' track and also a @kind='subtitles' track where the
>>> cues that are marked to be for SDH only are removed.
>>
>> Are the individual cues really marked with that metadata? If they
>> aren't, then exposing such a single track with kind 'captions' seems
>> like the correct mapping.
>
>I was under that impression, but I haven't been able to confirm this.
>Maybe somebody else with actual MPEG4 specs can confirm / refute that
>assumption?
>
>Cheers,
>Silvia.
>
Received on Tuesday, 3 June 2014 21:50:12 UTC