W3C home > Mailing lists > Public > public-html-a11y@w3.org > February 2010

Re: [media] Moving forward with captions / subtitles

From: Philip Jägenstedt <philipj@opera.com>
Date: Sun, 14 Feb 2010 22:44:45 +0800
To: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
Cc: "Eric Carlson" <eric.carlson@apple.com>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>
Message-ID: <op.u74gkvtaatwj1d@philip-pc>
On Sun, 14 Feb 2010 19:57:56 +0800, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Sun, Feb 14, 2010 at 10:23 PM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>> On Sun, 14 Feb 2010 17:22:23 +0800, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>>> On Sun, Feb 14, 2010 at 6:45 PM, Philip Jägenstedt <philipj@opera.com>
>>> wrote:
>>> I wouldn't think the format is the issue here - @type is just a
>>> description of what format to expect, not as a selection mechanism.
>>> Just like in an img element there is also support for several file
>>> formats, but there is no means to mark up selections between different
>>> ones. Even if we support more than one format, that shouldn't become a
>>> selection criterium.
>> If type has no influence over what track is selected I suggest we not  
>> have
>> the attribute at all.
> It's already an attribute of the <source> element and necessary there,
> so that's not possible. However, what is more important is that it is
> an indication for how to parse the referenced resource. In particular
> with srt we can include the charset that way without having to add an
> additional attribute. It's a necessary decoding hint.

type="" for <source> in <audio>/<video> is only used in the resource  
selection algorithm, it has no influence once the resource fetch algorithm  
starts and for decoding. The server-sent Content-Type is authoritative and  
I think it should be for text too. If type is *only* used for charset then  
let's use the charset attribute instead. In any case this isn't the most  
important point right now, so let's move on...

>>>> Do we at all want to support the case of enabling multiple text  
>>>> tracks in
>>>> a
>>>> declarative way or via browser context menus? In my opinion this is a  
>>>> bit
>>>> overkill (I've never used a media player that supports it) and we  
>>>> might
>>>> delegate this to scripts. If others agree, we don't need any grouping
>>>> element for the purpose of making a group of tracks mutually  
>>>> exclusive --
>>>> all tracks are mutually exclusive.
>>> Did you mean all source elements within a track of a certain role are
>>> mutually exclusive? If so, I agree.
>> No, I mean that all tracks of any language/role/whatever are mutually
>> exclusive. I've never seen a media player that allows enabling multiple  
>> text
>> tracks simultaneously and wouldn't want to figure out a UI for doing it  
>> in
>> the context menu or with native controls.
> In my demo, I have up to four tracks active at the same time:
> http://annodex.net/~silvia/itext/elephant_no_skin_v2.html
> * captions
> * subtitles
> * chapter markers
> * textual audio descriptions
> As long as they are not displayed onto the same position on screen,
> that's not a problem. Admittedly, in my demo, subtitles and captions
> clash, so one wins out. But the others do not clash. So, I think
> different roles can well be displayed in parallel.
>>>> The other kind of grouping is per role (subtitles/captions/karaoke),
>>>> language and... something else? If this is given by role="" and  
>>>> lang="",
>>>> how
>>>> should a context menu be constructed? Group by role or by language?
>>> To me - just from a menu construction POV - it seemed to make more
>>> sense to group by role and within role as alternatives.
>>> The idea of grouping per language seems to me to make things more
>>> complicated.
>> No grouping at all makes it even simpler :)
> Actually, no. When I tried that in
> http://annodex.net/~silvia/itext/elephant_no_skin.html it gave me all
> sorts of headaches. And the first feedback I got on that demo was: why
> is it so talkative and why aren't you grouping things that are
> alternatives together.

I assume the feedback was on the UI, not on the markup. I can't see the  
markup is really necessary to achieve the wanted UI -- simply group all  
tracks with the same role together. The markup itself can be organized  
into groups by other means, e.g. putting all captions under <!-- captions  

>>> I think with the tracks that are grouped by "role" we get exactly this
>>> behaviour. The distinction between them is based on lang, type and
>>> media query, so fits very well with this description of "alternate
>>> group".
>> No matter how we group it the tracks will be the same and the  
>> information
>> about the tracks available from the markup will be the same.
> Indeed, but if we don't group the parsing code gets a lot harder and
> it's much harder to decide which are alternatives to each other and
> which additional.

I really don't think there's a significant difference in complexity of  
track selection or constructing UI. In any case, simplicity of markup is  
far more important than ease of implementation here, so that should be the  

>> I don't think
>> that grouping should have any influence on track selection. Grouping is  
>> only
>> relevant if we want to:
>> * declaratively enable multiple text tracks simultaneously (in markup),  
>> or
>> via native browser UI
> As stated above, I indeed believe this is necessary. In particular if
> we want to support textual audio descriptions through srt files.

What's special about textual audio descriptions? Can you show an example  
of an existing application (native or webapp) that allows enabling  
multiple text tracks via menus and what it looks like? All I'm saying is  
that it isn't necessary to provide the information of which tracks are  
supposed to be mutually exclusive in markup -- it might still be po

>> * provide some nesting in context menus
> These can be created through parsing, too, but are much harder to create.

Not really, it's quite trivial to walk the child nodes and group tracks  
together by whatever criteria.

>> I don't think either of these are important enough to introduce new  
>> elements
>> or attributes.
>> Am I missing something?
> I think so - see above. :-)

We don't agree that role is the obvious axis along which tracks should be  
grouped in the UI. There also doesn't seem to be agreement that all tracks  
in a single role should be mutually exclusive. I think it should be  
possible to activate several tracks, but only in browsers that can figure  
out what a sensible UI for it should be, possible via a enabled=""  
attribute and definitely via script. I still can't see that any grouping  
of the markup is a good idea though, given that there are so many  
different axes along which these groups could be made and none really make  
more sense than the other. Personally I would be much more interested in  
language groups than role groups because I immediately know which  
languages I'm interested in but can live with either captions or subtitles.

Philip Jägenstedt
Core Developer
Opera Software
Received on Sunday, 14 February 2010 14:45:34 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:05:08 UTC