Re: [media] Moving forward with captions / subtitles

On Sat, 13 Feb 2010 21:04:36 +0800, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> Hi Philip,
>
>
> On Sat, Feb 13, 2010 at 9:19 PM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>> On Sat, 06 Feb 2010 11:41:10 +0800, Silvia Pfeiffer
>> <silviapfeiffer1@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> In a separate thread, there was an extensive discussion about what
>>> declarative markup we should propose to add to HTML5 to introduce a
>>> standard means for associating captions and subtitles that are
>>> provided in separate files with an audio or video element.
>>>
>>> The discussion relates to bug 5758 and has currently resulted in a
>>> first draft specification at:
>>> http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations
>>>
>>> As you will notice, this proposal builds on several other previously
>>> suggested declarative syntax proposals.
>>>
>>> This is a request for discussion of this specification with a view
>>> towards getting an agreement both in the media subgroup and the larger
>>> task force on two issues:
>>>
>>> 1. The proposed declarative markup
>>
>> I think the main outstanding problem is still a good name for the  
>> grouping
>> element. <textassoc> isn't great either because it's a bit difficult to
>> spell. Perhaps <track>? Even though several browser vendors are  
>> skeptical of
>> syncing audio/video from two different resources, it would make it  
>> spec-wise
>> possible to allow it in the future. For now, it's text-only though:
>>
>> <video src="video.ogg">
>>  <track src="captions.srt">
>> </video>
>>
>> or using <source> in the same way as for <video>:
>>
>> <video src="video.ogg">
>>  <track>
>>    <source type="text/srt" src="captions.srt" lang="en">
>>    <source type="text/srt" src="zimu.srt lang="zh">
>>  </track>
>> </video>
>>
>> Note that the resource selection algorithm is not a limitation here,  
>> because
>> we can freely define how <track>s are activated and how to select  
>> between
>> alternative <source>s in a <track>. Perhaps we need a new boolean  
>> attribute
>> like enabled="" to enable a track by default.
>
> In general I like the idea of calling it <track>. However, I have a
> slight issue with it because they are only virtual tracks - normally
> only the "tracks" that are multiplexed together inside a encapsulation
> format are called tracks. This would make the content inside a source
> element called tracks, but also the parallel external files. I predict
> confusion.

I think it would be good to treat them as the same as far as possible,  
including in the DOM API and MediaTracks collection. That way the same  
user JavaScript could operate on the resource without caring if the tracks  
are resource-internal or added using <track>.

> However, I must say I really like the idea of making it independent of
> "text", i.e. leaving the possibility open to add "tracks" of audio or
> video in future.
>
> I'd be happy for something that essentially means "external parallel  
> track".

Considering how many different names we have already come up with, I doubt  
<track> will be the last :) Brainstorm away!

>> I'm not to keen on charset="", but if it turns out to be necessary in
>> practice I'll just have to tolerate it. For now we have to define how  
>> the
>> default encoding is determined (*not* using the containing document's
>> encoding for sanity).
>
> There is no @charset attribute. I have explicitly taken it away, since
> it can be specified as part of the @type attribute where necessary. I
> would actually write a srt RFC that includes the charset as part of
> the mimetype. That solves that problem IMO.

Ah, excellent.

>> role="" is fine, but I'd like to see more ideas on what UAs should to  
>> with
>> it.
>
> The thought is to use it not just for captions, subtitles, and textual
> audio descriptions, but also for karaoke, lyrics, chapters, timed
> comments, timed metadata, and other such time-aligned text and
> annotations. There are examples with lyrics
> (http://svg-wow.org/audio/animated-lyrics.html, and
> http://annodex.net/~silvia/itext/chocolate_rain.html), and chapters
> (http://annodex.net/~silvia/itext/elephant_no_skin_v2.html). I'm sure
> we will come up with more similar examples.

Yes, but is it expected that the UA should do something with the  
attribute, like make context menus based on it? Or should it be part of  
the track selection algorithm? (Where "track selection algorithm" does not  
exist yet, but is what will select which tracks are enabled by default  
based on... language and such?)

>>> 2. The proposed default file formats to support for subtitles and
>>> captions: DFXP and SRT
>>
>> I'm really quite skeptical of DFXP, as it tries to be an interchange  
>> format
>> that can handle anything that any other subtitle format can handle. I  
>> think
>> browser support would at best be partial. The overlap with what can  
>> already
>> be done with script+CSS is also very large and I'd rather at least  
>> rendering
>> is defined entirely by CSS.
>>
>> SRT is a good baseline, but who will take on the task of writing a  
>> parsing
>> spec for it? This is probably not nearly as trivial as it may seem, and
>> requires collecting lots of sample data to see how SRT is used in the  
>> wild.
>> For example, what to do with the ghastly "HTML" sometimes mixed in SRT?  
>> (I
>> would prefer this to be completely unsupported.)
>
> I think you don't stand alone there. I think it is safest to just go
> with the base format. I have looked at a large number of srt files and
> those with html markup are mostly restricted to bold, italic and
> newline (<br/>) formattings. It was my intention to write an srt
> specification as a RFC and register the mime type at the same time. I
> have written RFCs before and don't think it would be hard.

OK, I look forward to seeing more work in this area :)

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Saturday, 13 February 2010 15:02:46 UTC