Re: [media] Moving forward with captions / subtitles

Hi Philip,

On Sun, Feb 14, 2010 at 6:45 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Sun, 14 Feb 2010 00:42:50 +0800, Eric Carlson <eric.carlson@apple.com>
> wrote:
>
>>
>> On Feb 13, 2010, at 7:01 AM, Philip Jägenstedt wrote:
>>
>>> On Sat, 13 Feb 2010 21:04:36 +0800, Silvia Pfeiffer
>>> <silviapfeiffer1@gmail.com> wrote:
>>>
>>>> However, I must say I really like the idea of making it independent of
>>>> "text", i.e. leaving the possibility open to add "tracks" of audio or
>>>> video in future.
>>>>
>>>> I'd be happy for something that essentially means "external parallel
>>>> track".
>>>
>>> Considering how many different names we have already come up with, I
>>> doubt <track> will be the last :) Brainstorm away!
>>>
>>  To me the singular "track" implies that only one <source> will be chosen
>> which will not be true in all cases. Maybe <tracks>?
>>
>>
>>>>> role="" is fine, but I'd like to see more ideas on what UAs should to
>>>>> with
>>>>> it.
>>>>
>>>> The thought is to use it not just for captions, subtitles, and textual
>>>> audio descriptions, but also for karaoke, lyrics, chapters, timed
>>>> comments, timed metadata, and other such time-aligned text and
>>>> annotations. There are examples with lyrics
>>>> (http://svg-wow.org/audio/animated-lyrics.html, and
>>>> http://annodex.net/~silvia/itext/chocolate_rain.html), and chapters
>>>> (http://annodex.net/~silvia/itext/elephant_no_skin_v2.html). I'm sure
>>>> we will come up with more similar examples.
>>>
>>> Yes, but is it expected that the UA should do something with the
>>> attribute, like make context menus based on it? Or should it be part of the
>>> track selection algorithm? (Where "track selection algorithm" does not exist
>>> yet, but is what will select which tracks are enabled by default based on...
>>> language and such?)
>>>
>>  I think the selection of alternates is an important point. Some media
>> container formats (eg. QuickTime and MPEG-4)  allow an author to mark tracks
>> as begin part of an "alternate group". This instructs the media engine to
>> enable only one track in the group based on a condition on the user's
>> machine when the file is opened for playback. For example, a movie can have
>> subtitle tracks and chapter tracks in multiple languages, but only one of
>> each is rendered when the movie plays.
>>
>>  We need to support this use case with external "tracks", and we need to
>> define the selection algorithm when a file has both internal and external
>> tracks.
>>
>>  We also need to define a mechanism to mark tracks as being part of an
>> alternate group. Is an attribute on <source> enough?
>>
>>    <tracks>
>>        <source type="text/srt" src="en-captions.srt" lang="en"
>> role="caption">
>>        <source type="text/srt" src="zh-captions.srt" lang="zh"
>> role="caption">
>>       <source type="text/srt" src="en-chapters.srt" lang="en"
>> role="chapters">
>>        <source type="text/srt" src="zh-chapters.srt" lang="zh"
>> role="chapters">
>>    </tracks>
>> Or should we have a grouping element like Silvia had in her early
>> proposal?
>>
>>    <tracks>
>>        <track role="caption">
>>            <source type="text/srt" src="en-captions.srt" lang="en">
>>            <source type="text/srt" src="zh-captions.srt" lang="zh">
>>        </track>
>>       <track role="chapters">
>>            <source type="text/srt" src="en-chapters.srt" lang="en">
>>            <source type="text/srt" src="zh-chapters.srt" lang="zh">
>>        </track>
>>    </tracks>
>>
>>  I hesitate to define yet another element, but I think the markup in a
>> complex case like Silvia's Elephants Dream sample,
>> http://annodex.net/~silvia/itext/elephant_no_skin_v2.html, is clearer
>> because of it. On the other hand, will complex cases like this be common
>> enough that we need it?
>
> How to group/nest <tracks>, <track>, <source>, type="", lang="" and role=""
> is a bit of a headache...
>
> For <video> and <audio>, <source>s are mutually exclusive and should
> represent exactly the same resource with only technical differences like
> codec, bitrate or resolution. It's not clear that we at all need <source> to
> switch between text formats as a common format here is very likely to be
> found without much controversy. We might want it if we seriously expect more
> than one format to be used though.

I wouldn't think the format is the issue here - @type is just a
description of what format to expect, not as a selection mechanism.
Just like in an img element there is also support for several file
formats, but there is no means to mark up selections between different
ones. Even if we support more than one format, that shouldn't become a
selection criterium.


> Do we at all want to support the case of enabling multiple text tracks in a
> declarative way or via browser context menus? In my opinion this is a bit
> overkill (I've never used a media player that supports it) and we might
> delegate this to scripts. If others agree, we don't need any grouping
> element for the purpose of making a group of tracks mutually exclusive --
> all tracks are mutually exclusive.

Did you mean all source elements within a track of a certain role are
mutually exclusive? If so, I agree.


> The other kind of grouping is per role (subtitles/captions/karaoke),
> language and... something else? If this is given by role="" and lang="", how
> should a context menu be constructed? Group by role or by language?

To me - just from a menu construction POV - it seemed to make more
sense to group by role and within role as alternatives.

The idea of grouping per language seems to me to make things more complicated.


> Would it in fact be sufficient to just use a flat list of <track type=""
> role="" src="" lang=""> ?

No, that makes it harder to read and to see which <source>s are
alternatives to each other.


> Or should we let the grouping be in any order and let role="" and lang="" be
> inherited? E.g.
>
> <video>
>  <tracks lang="en">
>    <track role="caption">
>    <track role="subtitles">
>  </tracks>
>  <tracks lang="sv">
>    <track role="caption">
>  </tracks>
> </video>
>
> vs
>
> <video>
>  <tracks role="caption">
>    <track lang="en">
>    <track lang="sv">
>  </tracks>
>  <tracks role="subtitles">
>    <track lang="en">
>  </tracks>
> </video>
>
> The difference is mainly in how context-menus are presented.
>
> Realistically though, few videos will have more than a few different tracks,
> and complicating the markup to enable properly nested context menus really
> might not be worth it.
>
> Just thinking out loud here...


To me it's a semantic issue, too. If we group by language, does that
mean I can only select one such group to be displayed and all the
others are alternatives? That would mean I have to take the English
captions with the English subtitles and the English audio description.
I don't think that makes much sense. Rather, I would e.g. chose the
English caption and the German subtitles.

The 'alternate group' specification in MPEG (and probably QuickTime)
does the following:

“alternate_group”: is an integer that specifies a group or collection
of tracks. If this field is 0 there is no information on possible
relations to other tracks. If this field is not 0, it should be the
same for tracks that contain alternate data for one another and
different for tracks belonging to different such groups. Only one
track within an alternate group should be played or streamed at any
one time, and must be distinguishable from other tracks in the group
via attributes such as bitrate, codec, language, packet size etc. A
group may have only one member.

I think with the tracks that are grouped by "role" we get exactly this
behaviour. The distinction between them is based on lang, type and
media query, so fits very well with this description of "alternate
group".

Cheers,
Silvia.

Received on Sunday, 14 February 2010 09:23:16 UTC