RE: Timed tracks from John Foliot on 2010-05-07 (public-html@w3.org from May 2010)

From: John Foliot <jfoliot@stanford.edu>
Date: Thu, 6 May 2010 21:56:48 -0700 (PDT)
To: "'Tab Atkins Jr.'" <jackalmage@gmail.com>
Cc: "'Geoff Freed'" <geoff_freed@wgbh.org>, "'Maciej Stachowiak'" <mjs@apple.com>, "'Philippe Le Hegaret'" <plh@w3.org>, "'Edward O'Connor'" <hober0@gmail.com>, "'Ian Hickson'" <ian@hixie.ch>, <public-html@w3.org>
Message-ID: <025401caeda1$aff1d6e0$0fd584a0$@edu>
Tab Atkins Jr. wrote:
>
> All of the use-cases of actual caption use on the web, collected on
> the WHATWG wiki at
> http://wiki.whatwg.org/wiki/Use_cases_for_timed_tracks_rendered_over_vi
> deo_by_the_UA.
>  Additionally, API-level access use-cases for captions on the web have
> been collected at
> http://wiki.whatwg.org/wiki/Use_cases_for_API-
> level_access_to_timed_tracks.

Thanks Tab for these URLs, I will add them to the resources and requirements 
gathering work that is currently happening at the W3C.

A few comments:

I am concerned that these "use-cases" are all just visualizations (screen 
captures) of how captioning/sub-titling is being dabbled at today. However 
they are images only (inaccessible to non-sighted members of the community 
as they lack any alt text values), and I do not think they encompass all 
use-cases. I see no indication or application of descriptive audio/text 
associated to any video (yet am aware of experimental work being done by IBM 
Japan), I see no distinction being made between captioning and sub-titling 
(yet the distinction is significant), and I see no reference or allusion to 
meeting the needs of the deaf-blind community, users with low vision who 
might need to enlarge the caption text or change foreground/background 
colors*, or users on very small screens (such as many hand-held devices) - 
all use-cases that are valid and worth considering today. (* in fact, there 
is one screen shot of a 'caption' that has dark blue text on a black 
background - I would suggest that here it is an extremely poor example of 
meeting accessibility needs)

As well, I see no factual data (studies, surveys, written feedback) 
associated with these pictures (not even explanations), nor feedback of any 
kind from commercial producers or other large volume creators - in fact 
based on the pictures I am left with the impression that Anime might have 
the largest need for captioning, a first impression I am sure is not 
correct.

Also, is there any documented discussion about internationalization issues, 
of bi di support, of Ruby support?

Finally, was there any analysis of legislative requirements (existing or 
pending) that might impact on deliverables? It would be a shame to invest a 
lot of effort into something that could not be taken up by public 
institutions or commercial vendors simply because it missed a simple check 
of what the law required. (And when it comes to accessibility issues, it's 
often about 'the law', rightly or wrongly)

So while this is a good visual start - collected from the wild - of 
individuals seeking to do the right thing, to suggest that it is "All of the 
use-cases of actual caption use on the web" is perhaps overstating what you 
have there just a bit.

>
> Indeed, plain SRT is pretty minimal, and doesn't address many of the
> documented use-cases.

Nor, I might add, any of the use-cases we are aware of that are *not* 
documented in the WHAT WG wiki.


> But it's very simple to both author and parse,
> and the extensions needed to make it handle all the aforementioned
> use-cases are pretty minimal.  It's also pretty common, apparently
> especially so amongst amateur subbers, which implies that it probably
> addresses the needs and desires of average authors pretty well.

This presumes that caption files (complete with time stamping) will always 
be hand rolled - a false presumption based upon my very real work 
experience - in fact I worked to create an auto-time-stamping system on 
campus because most authors don't or can't create time-stamped caption files 
by hand. Hand-crafted caption files simply do not scale. Besides, the hard 
part is not applying the time-stamping, it's generating the text transcript 
that's difficult. So 'simplicity of authoring' is a hollow justification.

>
> It may be that we end up needing to support multiple formats, such as
> perhaps a profile of TTML.  But I'd like to avoid that if at all
> possible, and from what I understand implementors would too.

In the first paragraph, you speak of authors, and in the second you speak of 
implementers.  However the HTML5 design principles have always been: end 
users over authors, authors over implementers, implementers over technical 
purity. What is currently missing in your argument is the users: the 
'documentation' of user needs is incomplete, and finding a solution that 
does not address user needs but is author and implementer friendly is the 
wrong path to be taking, which by all appearances here is the path that is 
being implemented today.

(re: "Subbers" - I would suggest that they are not, nor ever will be, the 
primary source of Caption files. To be candidly honest, they are often 
'pirates' grabbing BitTorrent Hollywood movies and creating sub-titles for 
themselves and friends in their native language. In professional circles 
there is a significant difference between Captions and Sub-titles)


>
> Indeed, we may end up needing to support TTML or DFXP.  But it's much
> more complex in both generation and parsing than we need, and requires
> work to map its formatting into CSS terms.  That effort may be better
> spent elsewhere.

Like inventing a new file format? Or developing a workable profile based 
upon existing work?

>
> Seeing as the subtitling ecosystem is pretty diverse,

Respectfully, there is a world of difference between sub-titling and 
providing captions and descriptive audio to persons with disabilities. I 
appreciate that for many this may be very nuanced, but please be mindful 
that there is a difference - the tolerance for inaccuracies in sub-titling 
(where often there is paraphrasing, or language specific idioms are 
'translated') is significantly higher than for captioning, where generally 
the expected accuracy rate (especially for off-line captioning) is expected 
in the 98% - 100% accurate range. Thanks (in advance) for appreciating the 
difference Tab. (I also appreciate that this really has nothing to do with 
the larger discussion, but using correct and appropriate terms is useful)


> and we can't
> possibly support all the formats out there, a lot of people are going
> to have to do transcoding to another format *anyway* to get their
> stuff on the web.  Making a few more people do it may be a worthwhile
> cost for the benefit of having a single simple format for captioning
> on the web.

OK, so then step up (not down or sideways) to time-stamp formats that have 
been created to address specific needs. If we need to specify a profile of 
one of the available existing W3C formats, then that is building on prior 
effort, not starting out again from scratch.

>
> No, the use-cases have been collected for a while, in hand with
> significant effort from Silvia Pfeiffer.  No need to invent a fiction
> of Hixie creating these things out of whole cloth.

I have never said or implied this. However, as a member of the W3C directly 
involved with this subject within the Accessibility Task Force, this new 
format - inserted into the working draft - came as a complete surprise to me 
as well as numerous others. Given that Ian is also a member of the W3C a11y 
TF, it is doubly frustrating that he neither consulted with us prior to, nor 
after, authoring this new bit of business. Once again, it smacks of 5 or 6 
people on the WHAT WG IRC channel scheming up something and running with it 
(whether or not this was indeed the case). This unilateral decision process 
is not how it works at W3C, something that everybody should be aware of by 
now, no matter which side you feel more allegiance to. The fact that this 
subject is currently under active discussion at the W3C is something that 
Ian should have been aware of, but did not factor in (IMHO), a fact that 
*somebody* has to own.

JF
Received on Friday, 7 May 2010 04:57:22 UTC