Re: [media] Moving forward with captions / subtitles

On Tue, 16 Feb 2010 13:29:48 +0800, Eric Carlson <eric.carlson@apple.com>  
wrote:

>
> On Feb 15, 2010, at 9:03 PM, Philip Jägenstedt wrote:
>
>> On Tue, 16 Feb 2010 10:22:34 +0800, Eric Carlson  
>> <eric.carlson@apple.com> wrote:
>>
>>>
>>>  Yikes, teach me to ignore email for 12 hours :-(
>>>
>>>
>>> On Feb 15, 2010, at 12:46 AM, Philip Jägenstedt wrote:
>>>
>>>> On Mon, 15 Feb 2010 15:19:09 +0800, Eric Carlson  
>>>> <eric.carlson@apple.com> wrote:
>>>>
>>>>>
>>>>> On Feb 14, 2010, at 11:06 PM, Philip Jägenstedt wrote:
>>>>>>
>>>>>> I think calling the grouping element <track> is a bad idea when it  
>>>>>> in fact doesn't specify a track but a group of tracks (each track  
>>>>>> in <source>).
>>>>>>
>>>>> But it does not represent a group of tracks!
>>>>>
>>>>> The <track> element represents a single track in the presentation,  
>>>>> which uses one of the <source> elements as its source of media data.
>>>>
>>>> How would this tie into the MediaTrack API and the MediaTracks  
>>>> collection? It is my understanding that each individual stream in a  
>>>> Ogg or MPEG-4 would be a MediaTrack.
>>>>
>>>
>>>  Yes, exactly.
>>>
>>>> Would a <track> or a <source> represent a MediaTrack? If it is  
>>>> <track>, how would one activate a single <source> via the MediaTracks  
>>>> collection? Or is the intention that source selection in <track>  
>>>> completely determine which <source> is used so that the only way of  
>>>> switching between e.g. languages is rearranging the order of  
>>>> <source>s and calling .load() (or similar)?
>>>>
>>>  I am proposing that a <track> be represented by a MediaTrack. The UA  
>>> would select one of the <source> elements, or the "src" attribute on  
>>> the <track>, and that file would be used as the track's media data.
>>>
>>>  As you note, this *is* different from "alternate tracks" in an MPEG-4  
>>> or QuickTime file, but it is different by design. If we represent each  
>>> <source> by a MediaTrack object we will need to load every source,  
>>> whether it is displayed or not, to answer questions about it. The  
>>> MediaTrack object in the multi track API proposal has an ellipsis  
>>> after "enabled" to represent the other track properties we will want  
>>> to expose:
>>>
>>> interface MediaTrack {
>>>  readonly attribute DOMString name;
>>>  readonly attribute DOMString role;
>>>  readonly attribute DOMString type;
>>>  readonly attribute DOMString lang;
>>>           attribute boolean enabled;
>>>  ...
>>> };
>>>
>>>  Some of these properties won't be possible to answer without loading  
>>> and parsing a file (eg. duration), which we shouldn't require for a  
>>> file that won't be used.
>>>
>>>  MPEG-4 and QuickTime files don't have this problem because even if a  
>>> track's media is external to the movie, the movie file always contains  
>>> the track meta data so it is possible get it without loading/parsing  
>>> the track data.
>>
>> Good point. My thinking is that attributes of MediaTrack that require  
>> loading the track would simply be unavailable when the track is not  
>> enabled, like e.g. HTMLMediaElement.duration. At least role, type and  
>> lang are available from markup though and should be what the "track  
>> selection algorithm" operates on.
>>
>   One problem with this is that tracks inside of a media file won't have  
> this restriction. Actually, track added in markup and disabled after the  
> data is loaded won't have this restriction either. This is likely to be  
> very confusing.

Is it really much different than e.g. HTMLMediaElement.duration being NaN  
when .readyState == HAVE_NOTHING ? I agree that it isn't optimal, but  
surely better than not exposing the different languages at all?

>> With <track><source>, is it at all possible to use the MediaTracks  
>> collection to activate tracks or build scripted menus? While not a  
>> must-have feature, it would be nice if the same API can be used to  
>> operate on both resource-internal tracks and tracks added with markup.
>>
>   Yes, I think it is very important that internal and external tracks  
> are represented in exactly the same way. An object in the MediaTracks  
> collection represents the <source> chosen by the resource selection  
> algorithm. In your complex example from earlier, assuming "video.ogv"  
> has one video and one audio track:
>
>     <video src="video.ogv">
>         <track role="SUB">
>             <source src="subs.en.srt" srclang="en">
>             <source src="subs.sv.srt" srclang="sv">
>         </track>
>         <track role="CC">
>             <source src="cc.en.srt" srclang="en">
>             <source src="cc.sv.srt" srclang="sv">
>         </track>
>     </video>
>
>   Every user would have :
>
> 	video.tracks(0).role == 'video'
> 	video.tracks(1). role == 'audio'
> 	video.tracks(2). role == 'sub'
> 	video.tracks(3). role == 'cc'
> 	video.tracks(0).src == 'video.ogv' 	// the media is in the movie file
> 	video.tracks(1).src == 'video.ogv'
>
>   But only users on a Swedish system would have (assuming the first  
> language is chosen if none match the user's system):
>
> 	video.tracks(2).src == 'subs.sv.srt'
> 	video.tracks(3).src == 'cc.sv.srt'
>
>   Disabling *any* track is just "video.tracks(n).enabled = false".

What would the MediaTracks collection would look like if there are two  
internal subtitle tracks vs if there are two external ones, both being  
mutually exclusive in each case? Unless I'm misunderstanding you, the two  
external tracks would actually only get one MediaTrack object, where e.g.  
.lang would depend on which was actually selected by resource selection.  
Internal tracks, however, would get one MediaTrack each with .lang set per  
track.

I think the resource selection of tracks is quite different from that of  
<audio> or <video>. Once resource selection has run for <audio>/<video> it  
cannot be re-run without also restarting playback (e.g. by calling  
load()). For text tracks however, we want to be able to switch tracks  
seamlessly and resource selection is just about enabling a default track  
(or none, depending on markup).

What kind of resource selection would you want for <track><source> and  
what script is necessary to switch between the English and Swedish  
subtitles while playing?

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Tuesday, 16 February 2010 06:22:23 UTC