Re: [media] Moving forward with captions / subtitles

Hi Philip,


On Sat, Feb 13, 2010 at 9:19 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Sat, 06 Feb 2010 11:41:10 +0800, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>
>> Hi all,
>>
>> In a separate thread, there was an extensive discussion about what
>> declarative markup we should propose to add to HTML5 to introduce a
>> standard means for associating captions and subtitles that are
>> provided in separate files with an audio or video element.
>>
>> The discussion relates to bug 5758 and has currently resulted in a
>> first draft specification at:
>> http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations
>>
>> As you will notice, this proposal builds on several other previously
>> suggested declarative syntax proposals.
>>
>> This is a request for discussion of this specification with a view
>> towards getting an agreement both in the media subgroup and the larger
>> task force on two issues:
>>
>> 1. The proposed declarative markup
>
> I think the main outstanding problem is still a good name for the grouping
> element. <textassoc> isn't great either because it's a bit difficult to
> spell. Perhaps <track>? Even though several browser vendors are skeptical of
> syncing audio/video from two different resources, it would make it spec-wise
> possible to allow it in the future. For now, it's text-only though:
>
> <video src="video.ogg">
>  <track src="captions.srt">
> </video>
>
> or using <source> in the same way as for <video>:
>
> <video src="video.ogg">
>  <track>
>    <source type="text/srt" src="captions.srt" lang="en">
>    <source type="text/srt" src="zimu.srt lang="zh">
>  </track>
> </video>
>
> Note that the resource selection algorithm is not a limitation here, because
> we can freely define how <track>s are activated and how to select between
> alternative <source>s in a <track>. Perhaps we need a new boolean attribute
> like enabled="" to enable a track by default.

In general I like the idea of calling it <track>. However, I have a
slight issue with it because they are only virtual tracks - normally
only the "tracks" that are multiplexed together inside a encapsulation
format are called tracks. This would make the content inside a source
element called tracks, but also the parallel external files. I predict
confusion.

However, I must say I really like the idea of making it independent of
"text", i.e. leaving the possibility open to add "tracks" of audio or
video in future.

I'd be happy for something that essentially means "external parallel track".



> I'm not to keen on charset="", but if it turns out to be necessary in
> practice I'll just have to tolerate it. For now we have to define how the
> default encoding is determined (*not* using the containing document's
> encoding for sanity).

There is no @charset attribute. I have explicitly taken it away, since
it can be specified as part of the @type attribute where necessary. I
would actually write a srt RFC that includes the charset as part of
the mimetype. That solves that problem IMO.


> role="" is fine, but I'd like to see more ideas on what UAs should to with
> it.

The thought is to use it not just for captions, subtitles, and textual
audio descriptions, but also for karaoke, lyrics, chapters, timed
comments, timed metadata, and other such time-aligned text and
annotations. There are examples with lyrics
(http://svg-wow.org/audio/animated-lyrics.html, and
http://annodex.net/~silvia/itext/chocolate_rain.html), and chapters
(http://annodex.net/~silvia/itext/elephant_no_skin_v2.html). I'm sure
we will come up with more similar examples.


>> 2. The proposed default file formats to support for subtitles and
>> captions: DFXP and SRT
>
> I'm really quite skeptical of DFXP, as it tries to be an interchange format
> that can handle anything that any other subtitle format can handle. I think
> browser support would at best be partial. The overlap with what can already
> be done with script+CSS is also very large and I'd rather at least rendering
> is defined entirely by CSS.
>
> SRT is a good baseline, but who will take on the task of writing a parsing
> spec for it? This is probably not nearly as trivial as it may seem, and
> requires collecting lots of sample data to see how SRT is used in the wild.
> For example, what to do with the ghastly "HTML" sometimes mixed in SRT? (I
> would prefer this to be completely unsupported.)

I think you don't stand alone there. I think it is safest to just go
with the base format. I have looked at a large number of srt files and
those with html markup are mostly restricted to bold, italic and
newline (<br/>) formattings. It was my intention to write an srt
specification as a RFC and register the mime type at the same time. I
have written RFCs before and don't think it would be hard.

Cheers,
Silvia.

Received on Saturday, 13 February 2010 13:05:31 UTC