RE: Media--Technical Implications of Our User Requirements from John Foliot on 2010-07-19 (public-html-a11y@w3.org from July 2010)

From: John Foliot <jfoliot@stanford.edu>
Date: Mon, 19 Jul 2010 16:21:30 -0700 (PDT)
To: "'Philippe Le Hegaret'" <plh@w3.org>, 'Philip Jägenstedt' <philipj@opera.com>
Cc: <public-html-a11y@w3.org>
Message-ID: <056001cb2799$1e2ebc50$5a8c34f0$@edu>
Philippe Le Hegaret wrote:
> > >
> > >
> > > QUESTION: Are subtitles separate documents? Or are they combined with
> > > captions
> > > in a single document, in which case multiple documents may be
> > > present to
> > > support subtitles and captions in various languages, e.g. EN, FR,
> > > DE, JP, etc.

The difference between captions and subtitles is nuanced to a certain 
extent. Captions traditionally are the same language as that spoken 'on 
screen', whilst sub-titles traditionally are alternative languages (for 
example, an Italian language movie would have captions with lang="IT", but 
subtitles in lang="EN") - of course these are generalizations but hopefully 
illustrative.  As well, sub-titles are normally considered to only capture 
the spoken part of the audio stream, whilst captions will include other 
import sound artifacts (i.e.[applause], [singing], [thunder], etc.) A useful 
resource for more details can be found at: 
http://www.dcmp.org/captioningkey/

In practice however the difference is mostly one of editing and content: in 
all cases the files will be near identical in structure

> >
> > Given that hyperlinks don't exist in any mainstream captioning
> > software
> > (that I know of), it can hardly be a requirement unless virtually all
> > existing software is insufficient.
>
> But youtube, for example, does have annotations with hyperlinks in
> them.
> They're not captions, but they're still timed text content that contain
> hyperlinks.

+1
The desire to repurpose caption files, or to enhance them to support as much 
of HTML as we can is strong, and a legitimate user requirement (for example, 
using the <strong> or <em> elements on caption files could render those 
semantics to deaf/blind users accessing the transcript via a Braille-refresh 
device). However, to meet this requirement, it seems to me that whatever 
time-text format we look at should be pars-able with an HTML parser 
(right?) - so this user-requirement will help us continue to shape the 
technical requirements.


> >
> > > volume (for each available audio track)
> >
> > .volume
>
> Is there a need to have simultaneous playing of audio tracks? If
> not, .volume is good enough indeed.
>
There may indeed be a need for multiple volume controls.

One thing that has continued to surface in our discussions is the need for 
clear author-practice guidance moving forward, as any one question may have 
more than one answer, depending...

So here's a scenario: We have identified a need of being able to adjust 
various pieces of a 'complete' sound-track: in particular/for example the 
spoken roles of a movie or television broadcast. In the UK, television users 
have the option of Clear Audio tracks, where supporting 'sounds' are 
filtered out (sound effects, "dramatic" music, etc.) - thus delivering to 
users with limited audio capacity by reducing audio clutter.

We likely can achieve this requirement in a couple of different ways, and in 
doing so we will need to convey the "how" to content authors so that they 
deliver on what we need, so we can deliver on what the users need: this 
absolutely follows the model of users over authors over implementers over 
tech purity.

So for example (and this is just John spit-balling) we could state that 
media assets should declare 2 audio streams (inside the media wrapper or 
declared as children of the <video> element, possibly using the suggested 
<track> element): one the fully mixed audio feed, the second the spoken-word 
only track (using a role-type demarcation) and UAs need to expose a 'switch' 
so that the end user can choose (perhaps also store the user preference in 
the UA as a user-setting), and/or we state that UA's provide a filtering 
mechanism (similar to an "equalizer" interface) as part of the UI of the 
browser (for example, WinAmp - a desktop MP3 player that also renders HTML 
documents in its UI - has an equalizer that can be manually adjusted or 
loads one of a number of presets). While both suggestions here could likely 
address the user-requirement I think we can all safely agree that we should 
likely chose one as a 'default' and not expect both. We've not yet reached 
that point in the discussion, but I think that it is a discussion we will be 
having - soon. However, in either scenario, true user support must also come 
from the content provider, in that they produce the media asset to meet the 
mechanism we have set up.

JF
Received on Monday, 19 July 2010 23:22:04 UTC