W3C home > Mailing lists > Public > public-html-a11y@w3.org > December 2010

Re: Media Gaps Document--36 Hour Consensus Call

From: Geoff Freed <geoff_freed@wgbh.org>
Date: Thu, 16 Dec 2010 08:20:14 -0500
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, John Foliot <jfoliot@stanford.edu>
CC: Eric Carlson <eric.carlson@apple.com>, Sean Hayes <Sean.Hayes@microsoft.com>, Frank Olivier <Frank.Olivier@microsoft.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <C92F7C3E.13EC8%geoff_freed@wgbh.org>

I'm not going to raise a huge fuss or open a new debate over this, but merely wanted to point out that "widely used" is not an objective way to quantify usage.  But just for the sake of argument, it isn't accurate to search only for the TTML extension as a way to determine usage of the format because that extension is relatively new.  Remember, TTML was called DFXP for several years before the name was changed, and filename.dfxp, filename.dfxp.xml or filename.xml (and perhaps others) have all been used to identify DFXP/TTML caption files.

Other points to consider:  the BBC has been providing TTML captions on its on-line offerings since 2008- using filename.xml- so that probably adds up to thousands of caption files right there.  And although I am unable to name names, I can say that major broadcast and Web-based video-streaming entities are now beginning to adopt TTML as their caption-display format.  Finally, SMPTE has completed its work on SMPTE-TT (see https://store.smpte.org/SearchResults.asp?Search=2052&Extensive_Search=Y&Submit=Search), which is the standard for converting CEA-608 caption data for use on the Web.  SMPTE-TT is based on TTML.  This alone is probably going to result in the creation of thousands of new TTML-based caption files in the not-too-distant future.

I don't think we need to spend time counting caption files and, again, I don't think it's necessary to get into a big debate over this.  I won't object if you re-insert "widely used" into the requirements doc.  It just doesn't seem to me that the term is appropriate.


On 12/16/10 1:45 AM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

On Thu, Dec 16, 2010 at 5:41 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> On Thu, Dec 16, 2010 at 4:56 PM, John Foliot <jfoliot@stanford.edu> wrote:
>> Eric Carlson wrote:
>>> On Dec 15, 2010, at 7:13 PM, Silvia Pfeiffer wrote:
>>> >
>>> > I think "widely used" was a fair assessment for SRT. All professional
>>> > entities that I've known that use other formats are usually also
>>> > capable of using SRT because it's so simple. Just saying "is
>>> > implemented in some sectors of the Web-development community" is
>>> > unfair because there are many professional entities that use it, too.
>>> > They make no big fuss about it, but they support it. SRT support is
>>> > more commonly found than TTML and I would therefore object to any
>>> > representation that tries to imply the opposite.
>>>  I agree! SRT is one of the formats that YouTube recommends people use
>>> when uploading captions
>>> that are not already formatted [1]:
>>> If you do not have formatted caption data, such as a transcript that
>> does
>>> not have timing data, we recommend using SubRip (*.SRT)
>> or SubViewer (*.SUB)
>>> for generating formatted captions.
>> Although I have complained to the HTML WG Chairs in the past about the use
>> of vague metrics when it comes to measurement, I think that here 'widely
>> used' does represent a fairly accurate assessment of SRT's usage. It's
>> usage in the fan-sub community for sub-titling is also well known,
>> although getting a handle on quantity metrics is difficult. Unless there
>> is strong push-back I believe we are best served by retaining that phrase
>> here.
>> My $0.02 Canadian
>> JF
> While it's only indicative, a Google search for filetype:srt provides
> 264,000 results while filetype:ttml provides 713 results.
> Neither of these numbers mean much because the majority of these files
> will not live on the 'net. But they are indicative and quantitative.

Actually - just looking at the ttml files - they are all not Timed
Text ML files. Doesn't seem like this number means much.

Received on Thursday, 16 December 2010 13:25:05 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:55:49 UTC