RE: using TTML for caption delivery, discussion from Sean Hayes on 2011-02-13 (public-html-a11y@w3.org from February 2011)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Sun, 13 Feb 2011 14:27:09 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, David Singer <singer@apple.com>
CC: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B913FA46EDC@DB3EX14MBXC303.europe.corp.microsoft.c>
Well there are multiple ways to produce captions from a live source and deliver them, and this has been done with TTML; but actually the format itself isn't really the hard part, which is capturing the text, reducing the latency and then getting them muxed in and delivered on time with the appropriate time stamps. Obviously one wouldn't try and deliver live produced captions in a single file, but it can be a reasonable way to send them for packaged media, whether delivered as a stream or not. The ideal is to be flexible and allow for a continuum between the all in one, and every caption separate approach.

IMO this discussion was to address some of David's technical concerns over TTML, since this is clearly not a forum to try and influence the VPAAC. WRT HTML5, I believe that the chairs and PLH are still working on action 193, but as I understand it the general idea is  that HTML5 should not specify a format itself; but reference one or more formats specified elsewhere in the W3C.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: 13 February 2011 11:24
To: David Singer
Cc: Sean Hayes; HTML Accessibility Task Force
Subject: Re: using TTML for caption delivery, discussion

On Sun, Feb 13, 2011 at 8:37 PM, David Singer <singer@apple.com> wrote:
>
> On Feb 13, 2011, at 8:23 , Sean Hayes wrote:
>
>> Point 3. Single or chunked delivery
>> Since the typical size of a caption file is only on the order of 10s of Kb, maybe 100Kb or so  for a long form movie, actually receiving it all up front and parsing it in one go isn't that much of a problem, and generally an advantage. The only time it would be an issue is in delivery of live content where you don't know the captions in advance, and as far as I can tell that's not a use case that is supported by the <video> tag  today.
>
> The video tag can point at anything, including RTSP controlled streams, and particularly it can point at chunked-over-HTTP manifest files, and even when pointing at a http: URL for a media file, byte-range access for time ranges can work.  Nothing in HTML says it has to be an http: URL, and nothing says it has to be a simple from-the-beginning simple download.


I've actually seen the video element in use for live video streaming,
so it's not just theoretically possible, but actually in active use.
We have to make sure we can deliver captions in such scenarios and one
requirement for this is that captions are interleaved with the
audio-visual data in a time-synchronized stream.


>> Integrating TTML into MPEG4 again is fairly easy due to the small size, it can simply all fit in one XML box. Or be delivered as multiple segments in a trak. This has been defined for DECE and could be adopted into MPEG.
>
> Whole documents do sound 'heavy' though.

Not just heavy - they, in fact, make live streaming with live created
captions impossible, since these could only be created interleaved
with the audio-visual data. Delivery in one XML box is definitely not
the best solution for the caption problem.


>> Point 4. Profiles.
>> There is a fairly comprehensive profiling mechanism built into TTML,
>
> but it only seems to allow covering language features, not characteristics of the stream (like, that it's in time order) or other functional aspects (like, CSS styling support), right?
>
>>
>> So I don't believe we actually achieve very much in the real world by trying to make a decision now. In a few years we may be able to see which format is gaining most ground in practice and make a decision then. The thing to do today is to ship HTML5 so that captioning is not precluded, allow pioneering content authors to write caption content in any format they choose, and wait and see how the browser vendors do on implementing <track> natively over time.
>
> Total agreement here.  I have heard rumblings of a suggested mandate for TTML, whereas I would prefer to agree with you and get some experience doing captioning in specific and accessibility support in general before we see mandates.


I assume we are talking about the FCC VPAAC discussion here and not
what we should recommend for HTML5 (given that the deicions of caption
format in the HTML5 spec has been resolved IMHO)? It would indeed be
good if the FCC didn't recommend a format, but rather only specified
requirements that a format has to meet.


Cheers,
Silvia.
Received on Sunday, 13 February 2011 14:27:46 UTC