Re: Tech Discussions on the Multitrack Media (issue-152) from David Singer on 2011-02-11 (public-html@w3.org from February 2011)

From: David Singer <singer@apple.com>
Date: Fri, 11 Feb 2011 13:07:17 -0800
To: public-html <public-html@w3.org>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Mark Watson <watsonm@netflix.com>
Message-Id: <07D54AA6-7CB4-4FA7-8B98-D178408ACD64@apple.com>
I like this;  it cleanly documents the media resource as a primary resource (via the src attribute or source elements) and optional add-ons, which can satisfy any need, and be in any media format (text, audio, video) and coding format (webVTT or TTML, for example).  One has inclusive-or selection over the tracks (choose the main media, and as many tracks as you need to meet your needs) and exclusive-or selection over <source>s (pick exactly one).  As it happens, this is very similar to what MPEG DASH does, and I'd like to take our 'kind' tagging there so we have a uniform way in the industry of describing what a media element or add-on does.

It does seem as if we need to look a little at the 'kind' attribute.  Kind is a rather vague word; I could wish for a better one.  It's actual not indicating the media 'kind' (audio, video, text) or the coding 'kind' (VP8, TTML) but the function or purpose of the content (it provides captions, sign language, audio description of video, and so on).

For some accessibility adaptations, as we have discussed, it may be easier to make an integrated resource rather than add-on tracks.  For example, when doing audio description of video, you need to do some or all of:
a) mute out or re-author parts of the normal audio track, to make the descriptions audible
b) pause the video sometimes, when the description takes longer to say than the scene lasts.
Doing these as part of an 'instruction' side-file is way more complicated than just making a new movie.  For some accessibility needs, re-authoring may be the only way to meet the need (think of repetitive stimulus avoidance).  And some content owners want to keep things 'in one file' for content management reasons. We ought to be able to say that a <source> can meet one or more functional needs.


If we were to (a) find a better name and (b) allow the tag to indicate that multiple functional needs can be satisfied by one choice (a file that contains both text and sign-language is conceivable, and they serve similar communities so it's not unreasonable) and (c) be able to indicate that a <source> can also meet some of these needs, then I think we place ourselves in a powerful position.

I suggest
i) change 'kind' to 'function';
ii) have it take a list of functions, not just one;
iii) allow it on <source> elements to indicate that a source provides those functions

Finally, we need to look at the namespace of the function names;  do we have a clean way to do experimentation and introduction of new accessibility axes?  The CSS route of having experimental vendor prefixes would work, but it's not pretty.


On Feb 10, 2011, at 11:26 , Eric Carlson wrote:

> 
>   I agree with Mark, we need to make it possible for script to discover and configure non-text tracks internal to the media resource. 
> 
>   The current <track> API allows this for in-band data that "the user agent recognises and supports as being equivalent to a text track" [1], so I think we should extend <track> to support other media types instead of creating a new mechanism or element type. This can be done with a combination of options 2 and 3 - generalizing <track> to allow the inclusion of external audio, video, and accomodating multiple media formats and configurations with <source> elements as we do for <audio> and <video>.
> 
>   Here is the example from the multi-track wiki page with multiple formats for the audio description and sign language tracks:
> 
>    <video id="v1" poster=“video.png” controls>
>        <!-- primary content -->
>        <source src=“video.webm” type=”video/webm”>
>        <source src=“video.mp4” type=”video/mp4”>
> 
>        <!-- pre-recorded audio descriptions -->
>        <track kind="descriptions" type="audio/ogg" srclang="en" label="English Audio Description">
>            <source src="audesc.ogg" type="audio/ogg">
>            <source src="audesc.mp3" type="audio/mpeg">
>        </track>
> 
>        <!-- sign language overlay -->
>        <track kind="signings" type="video/webm" srclang="asl" label="American Sign Language">
>            <source src="signlang.webm" type="video/webm">
>            <source src="signlang.mp4" type="video/mp4">
>        </track>
>    </video>
> 
>   Allowing <source> inside of <track> also makes it possible to include alternate caption formats, eg:
> 
>   <video id="v1" poster=“video.png” controls>
>        <!-- primary content -->
>        <source src=“video.webm” type=”video/webm”>
>        <source src=“video.mp4” type=”video/mp4”>
> 
>        <!-- captions -->
>        <track kind="captions" type="audio/ogg" srclang="en" label="Captions">
>            <source src="captions.vtt" type="text/vtt">
>            <source src="captions.xml" type="application/ttml+xml">
>        </track>
>    </video>
> 
>   Unlike option 3 this does not require new interfaces, but it will probably require a new attribute on <track> so it is possible to determine the media type. It will also require a "currentSrc" attribute so it is possible to determine which source was chosen.
> 
>   I will add this option to the wiki.
> 
>   I also think that it would be useful to be able to synchronize multiple media elements in a page, but I see this as an additional requirement. Option 6 allows separate media elements to be synchronized, but it does not allow the discovery and configuration of in-band audio and video tracks. It will, however, work with the option I have outlined above.
> 
> eric
> 
> [1] http://www.w3.org/TR/html5/video.html#sourcing-in-band-text-tracks
> 
> 
> On Feb 10, 2011, at 5:47 AM, Mark Watson wrote:
> 
>> Hi everyone,
>> 
>> I have a couple of comments on this proposal, but first, since this is my first post to this list I should introduce myself. I am representing Netflix - we joined W3C just this week. We are interested in ensuring that a streaming service like ours could in future be supported by HTML5.
>> 
>> One thing we are interested in is support for multiple languages, for both audio and text tracks and therefore a Javascript API to discover and select amongst those tracks.
>> 
>> We use a form of HTTP adaptive streaming, which if translated to HTML5 would mean providing a URL for a manifest to the <video> element as in Option (1) on the wiki page. But there is also another case where there are multiple tracks and no HTML markup: when there are multiple tracks inside a single ordinary media file e.g. an mp4 file with multiple audio language tracks.
>> 
>> The distinction between TextTrack and MediaTrack in the API under option (1) seems strange to me. Text is just another kind of media, so shouldn't the kind for each track be ( Audio | Video | Text ) rather than ( Media | Text ) where Media = ( Audio | Video ) ? [This is how it is framed in option 3, albeit up one level].
>> 
>> I don't have a strong opinion on the markup aspect, but I think the first side-condition is important (that the API be the same whether the tracks come from explicit markup or are present within a multiplexed file or described by a manifest). If I read rightly this condition is not met in (3), (4) and (6) right ?
>> 
>> Best,
>> 
>> Mark Watson
>> watsonm@netflix.com
>> On Feb 9, 2011, at 11:56 PM, Silvia Pfeiffer wrote:
>> 
>>> Everyone,
>>> 
>>> Your input on this is requested.
>>> 
>>> Issue-152 is asking for change proposals for a solution for media
>>> resources that have more than just one audio and one video track
>>> associated with them. The spec addresses this need for text tracks
>>> such as captions and subtitles only [1]. But we haven't solved this
>>> problem for additional audio and video tracks such as audio
>>> descriptions, sign language video, and dubbed audio tracks.
>>> 
>>> In the accessibility task force we have discussed different options
>>> over the last months. However, the number of people that provide
>>> technical input on issues related to media in the TF is fairly
>>> limited, so we have decided to use the available time until a change
>>> proposal for issue-152 is due (21st February [2]) to open the
>>> discussion to the larger HTML working group with the hope of hearing
>>> more opinions.
>>> 
>>> Past accessibility task force discussions [3][4] have exposed a number
>>> of possible markup/API solutions.
>>> 
>>> The different approaches are listed at
>>> http://www.w3.org/WAI/PF/HTML/wiki/Media_Multitrack_Media_API . This
>>> may be an incomplete list, but it's a start. If you have any better
>>> ideas, do speak up.
>>> 
>>> Which approach do people favor and why?
>>> 
>>> Cheers,
>>> Silvia.
>>> 
>>> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-track-element
>>> [2] http://lists.w3.org/Archives/Public/public-html/2011Jan/0198.html
>>> [3] http://lists.w3.org/Archives/Public/public-html-a11y/2010Oct/0520.html
>>> [4] http://lists.w3.org/Archives/Public/public-html-a11y/2011Feb/0057.html
>>> 
>>> 
>> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Friday, 11 February 2011 21:08:09 UTC