RE: Requirements for external text alternatives for audio/video from Sean Hayes on 2010-03-15 (public-html-a11y@w3.org from March 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Mon, 15 Mar 2010 10:20:46 +0000
To: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B9119ED82F9@DB3EX14MBXC313.europe.corp.microsoft.c>
Due to a clerical error, I was not able to cast a vote in the survey, but I'll add my comments here. Basically I'm in favour of TTML (DFXP or a new HTML specific profile), SRT will only be acceptable if it borrows a lot of the necessary features from the HTML world, (e.g. is amenable to CSS styling etc). However this complicates both the processing, and standardisation of SRT, so I'm somewhat reserving opinion on that until I see how it would be defined. Having a reference to a self-contained system seems to me to be a better approach for HTML.

While I think everyone agrees that "starting simple" is a great idea, my practical experience is that nobody ever agrees on what the "simplest" set is, and once you satisfy everyone's needs, the profile tends to be larger than you'd hoped. The TTML profiles represent a number of years of exactly that kind of argument.  

However I agree that explicitly writing down the requirements is a good idea. Since TTML is designed to go in a disciplined way from very simple through to full featured, and is openly extensible, once the requirements are documented, mapping to a TTML profile should be straightforward.

Sean.

-----Original Message-----
From: public-html-a11y-request@w3.org [mailto:public-html-a11y-request@w3.org] On Behalf Of Geoff Freed
Sent: 15 March 2010 09:25
To: Silvia Pfeiffer; HTML Accessibility Task Force
Subject: RE: Requirements for external text alternatives for audio/video


I'm on a rather tight deadline and so may not be able to fully address everything below for a day or two.  One comment inline for the time being.

Geoff/wgbh
________________________________________
From: public-html-a11y-request@w3.org [public-html-a11y-request@w3.org] On Behalf Of Silvia Pfeiffer [silviapfeiffer1@gmail.com]
Sent: Sunday, March 14, 2010 8:08 PM
To: HTML Accessibility Task Force
Subject: Requirements for external text alternatives for audio/video

Hi all,

Looking at the recent survey on caption formats and its results, see http://www.w3.org/2002/09/wbs/44061/media-text-format/results, it seems that what is currently written in the change proposal at http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations#File_Formats
got confirmed:

"A brief discussion at the TPAC in November 2009 seemed to indicate that the W3C Timed Text Format DFXP should be the first choice. As an alternate, simple format the SubRip srt format in its simplest form should also be supported by browsers. Since srt can be regarded as a simple subpart of DFXP, creating support for srt will be simple."

We have 15 voices for SRT and 14 for DFXP.

However, looking at the detailed replies, I can see that we basically have two camps: one that says "let's just start simple" and the other that says "we need something that is extensible, incorporates styling and markup".

What it tells me is that we never really looked at what our requirements for synchronised text alternatives, and in particular for caption formats here.

I'd like us to collect these requirements so we can make a better recommendation as a group. We should look at these requirements from several view points, some of which may be:
* a legal POV (what do a11y laws require us to do),
* a WCAG requirements POV,
* a a11y user's usability POV,
* an international user's POV,
an anything you can think of that I forgot.

So, let me pose the key question: why do we need more than unformatted text, a start time and an end time to provide subtitles/captions for users?

Or let me be a bit more of a devil's advocate:
What functionality is required on top of SRT and who needs it? Seeing as, e.g. YouTube doesn't only start time, end time and unformatted text and gets very far with it, why would we need to support more than that?


GF:
While I understand this is just a DA point of view, we should definitely not be using one entity's approach to text-display, or caption/subtitle generation, as an example of ideal practice.  At this moment, Google/Youtube supports a string of text with a begin time and an end time, but this doesn't mean they won't support other features, including styling, in the future.  What they alone are doing at this point in time should not govern our decision.  After all, we're serving an audience *in part* of deaf/hard-of-hearing users, not just software engineers.



Please help us keep this a debate on facts and on real requirements and not turn it into a religious debate.

Best Regards,
Silvia.
Received on Monday, 15 March 2010 13:44:30 UTC