Re: using TTML for caption delivery, discussion from Silvia Pfeiffer on 2011-02-13 (public-html-a11y@w3.org from February 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sun, 13 Feb 2011 22:24:03 +1100
To: David Singer <singer@apple.com>
Cc: Sean Hayes <Sean.Hayes@microsoft.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <AANLkTiniqF7_EFwjJpfNkOFs3-R+TEz9wPEvGdj+=W8r@mail.gmail.com>

On Sun, Feb 13, 2011 at 8:37 PM, David Singer <singer@apple.com> wrote:
>
> On Feb 13, 2011, at 8:23 , Sean Hayes wrote:
>
>> Point 3. Single or chunked delivery
>> Since the typical size of a caption file is only on the order of 10s of Kb, maybe 100Kb or so  for a long form movie, actually receiving it all up front and parsing it in one go isn't that much of a problem, and generally an advantage. The only time it would be an issue is in delivery of live content where you don't know the captions in advance, and as far as I can tell that's not a use case that is supported by the <video> tag  today.
>
> The video tag can point at anything, including RTSP controlled streams, and particularly it can point at chunked-over-HTTP manifest files, and even when pointing at a http: URL for a media file, byte-range access for time ranges can work.  Nothing in HTML says it has to be an http: URL, and nothing says it has to be a simple from-the-beginning simple download.


I've actually seen the video element in use for live video streaming,
so it's not just theoretically possible, but actually in active use.
We have to make sure we can deliver captions in such scenarios and one
requirement for this is that captions are interleaved with the
audio-visual data in a time-synchronized stream.


>> Integrating TTML into MPEG4 again is fairly easy due to the small size, it can simply all fit in one XML box. Or be delivered as multiple segments in a trak. This has been defined for DECE and could be adopted into MPEG.
>
> Whole documents do sound 'heavy' though.

Not just heavy - they, in fact, make live streaming with live created
captions impossible, since these could only be created interleaved
with the audio-visual data. Delivery in one XML box is definitely not
the best solution for the caption problem.


>> Point 4. Profiles.
>> There is a fairly comprehensive profiling mechanism built into TTML,
>
> but it only seems to allow covering language features, not characteristics of the stream (like, that it's in time order) or other functional aspects (like, CSS styling support), right?
>
>>
>> So I don't believe we actually achieve very much in the real world by trying to make a decision now. In a few years we may be able to see which format is gaining most ground in practice and make a decision then. The thing to do today is to ship HTML5 so that captioning is not precluded, allow pioneering content authors to write caption content in any format they choose, and wait and see how the browser vendors do on implementing <track> natively over time.
>
> Total agreement here.  I have heard rumblings of a suggested mandate for TTML, whereas I would prefer to agree with you and get some experience doing captioning in specific and accessibility support in general before we see mandates.


I assume we are talking about the FCC VPAAC discussion here and not
what we should recommend for HTML5 (given that the deicions of caption
format in the HTML5 spec has been resolved IMHO)? It would indeed be
good if the FCC didn't recommend a format, but rather only specified
requirements that a format has to meet.


Cheers,
Silvia.

Received on Sunday, 13 February 2011 11:24:55 UTC