Re: [media] Moving forward with captions / subtitles

On Sun, 14 Feb 2010 00:42:50 +0800, Eric Carlson <eric.carlson@apple.com>  
wrote:

>
> On Feb 13, 2010, at 7:01 AM, Philip Jägenstedt wrote:
>
>> On Sat, 13 Feb 2010 21:04:36 +0800, Silvia Pfeiffer  
>> <silviapfeiffer1@gmail.com> wrote:
>>
>>> However, I must say I really like the idea of making it independent of
>>> "text", i.e. leaving the possibility open to add "tracks" of audio or
>>> video in future.
>>>
>>> I'd be happy for something that essentially means "external parallel  
>>> track".
>>
>> Considering how many different names we have already come up with, I  
>> doubt <track> will be the last :) Brainstorm away!
>>
>   To me the singular "track" implies that only one <source> will be  
> chosen which will not be true in all cases. Maybe <tracks>?
>
>
>>>> role="" is fine, but I'd like to see more ideas on what UAs should to  
>>>> with
>>>> it.
>>>
>>> The thought is to use it not just for captions, subtitles, and textual
>>> audio descriptions, but also for karaoke, lyrics, chapters, timed
>>> comments, timed metadata, and other such time-aligned text and
>>> annotations. There are examples with lyrics
>>> (http://svg-wow.org/audio/animated-lyrics.html, and
>>> http://annodex.net/~silvia/itext/chocolate_rain.html), and chapters
>>> (http://annodex.net/~silvia/itext/elephant_no_skin_v2.html). I'm sure
>>> we will come up with more similar examples.
>>
>> Yes, but is it expected that the UA should do something with the  
>> attribute, like make context menus based on it? Or should it be part of  
>> the track selection algorithm? (Where "track selection algorithm" does  
>> not exist yet, but is what will select which tracks are enabled by  
>> default based on... language and such?)
>>
>   I think the selection of alternates is an important point. Some media  
> container formats (eg. QuickTime and MPEG-4)  allow an author to mark  
> tracks as begin part of an "alternate group". This instructs the media  
> engine to enable only one track in the group based on a condition on the  
> user's machine when the file is opened for playback. For example, a  
> movie can have subtitle tracks and chapter tracks in multiple languages,  
> but only one of each is rendered when the movie plays.
>
>   We need to support this use case with external "tracks", and we need  
> to define the selection algorithm when a file has both internal and  
> external tracks.
>
>   We also need to define a mechanism to mark tracks as being part of an  
> alternate group. Is an attribute on <source> enough?
>
>     <tracks>
>         <source type="text/srt" src="en-captions.srt" lang="en"  
> role="caption">
>         <source type="text/srt" src="zh-captions.srt" lang="zh"  
> role="caption">
>        <source type="text/srt" src="en-chapters.srt" lang="en"  
> role="chapters">
>         <source type="text/srt" src="zh-chapters.srt" lang="zh"  
> role="chapters">
>     </tracks>
> Or should we have a grouping element like Silvia had in her early  
> proposal?
>
>     <tracks>
>         <track role="caption">
>             <source type="text/srt" src="en-captions.srt" lang="en">
>             <source type="text/srt" src="zh-captions.srt" lang="zh">
>         </track>
>        <track role="chapters">
>             <source type="text/srt" src="en-chapters.srt" lang="en">
>             <source type="text/srt" src="zh-chapters.srt" lang="zh">
>         </track>
>     </tracks>
>
>   I hesitate to define yet another element, but I think the markup in a  
> complex case like Silvia's Elephants Dream sample,  
> http://annodex.net/~silvia/itext/elephant_no_skin_v2.html, is clearer  
> because of it. On the other hand, will complex cases like this be common  
> enough that we need it?

How to group/nest <tracks>, <track>, <source>, type="", lang="" and  
role="" is a bit of a headache...

For <video> and <audio>, <source>s are mutually exclusive and should  
represent exactly the same resource with only technical differences like  
codec, bitrate or resolution. It's not clear that we at all need <source>  
to switch between text formats as a common format here is very likely to  
be found without much controversy. We might want it if we seriously expect  
more than one format to be used though.

Do we at all want to support the case of enabling multiple text tracks in  
a declarative way or via browser context menus? In my opinion this is a  
bit overkill (I've never used a media player that supports it) and we  
might delegate this to scripts. If others agree, we don't need any  
grouping element for the purpose of making a group of tracks mutually  
exclusive -- all tracks are mutually exclusive.

The other kind of grouping is per role (subtitles/captions/karaoke),  
language and... something else? If this is given by role="" and lang="",  
how should a context menu be constructed? Group by role or by language?

Would it in fact be sufficient to just use a flat list of <track type=""  
role="" src="" lang=""> ?

Or should we let the grouping be in any order and let role="" and lang=""  
be inherited? E.g.

<video>
   <tracks lang="en">
     <track role="caption">
     <track role="subtitles">
   </tracks>
   <tracks lang="sv">
     <track role="caption">
   </tracks>
</video>

vs

<video>
   <tracks role="caption">
     <track lang="en">
     <track lang="sv">
   </tracks>
   <tracks role="subtitles">
     <track lang="en">
   </tracks>
</video>

The difference is mainly in how context-menus are presented.

Realistically though, few videos will have more than a few different  
tracks, and complicating the markup to enable properly nested context  
menus really might not be worth it.

Just thinking out loud here...

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Sunday, 14 February 2010 07:46:42 UTC