Re: Media--Technical Implications of Our User Requirements

Philip Jägenstedt writes:
> Comments inline below, snipped the rest:
> 
> On Wed, 14 Jul 2010 05:51:55 +0200, Janina Sajka
> <janina@rednote.net> wrote:
> 
> >          + 2.2 Texted Audio Description
> >
> >Text content with the ability to contain semantic and style
> >instructions.
> >
> >Multiple documents may be present to support texted audio description in
> >various languages, e.g. EN, FR, DE, JP, etc, or to support multiple
> >levels of description.
> 
> What semantics and style are required for texted audio descriptions,
> specifically? What does "levels of description" mean here?
> 
Texted audio descriptions also pertain to users with low vision or with
various learning disabilities. Thus, font overides,
foreground/background color, etc., matter.

And, while I'm not seeing it in the user reqs at the moment, multiple
levels were previous mentioned with respect to different levels of
complexity for different users, e.g. descriptions aimed at different
school grades (in K-12 content). I don't recall we've decided anything
specific regarding levels. I would expect they'd be handled as a
separate document.

> >          + 2.5 Content Navigation by Content Structure
> >
> >A structured data file.
> >
> >NOTE: Data in this file is used to synchronize all media representations
> >available for a given content publication, i.e. whatever audio,
> >video, and
> >text document--default and alternative--versions may be provided.
> 
> Couldn't the structure be given as chapters of the media resource
> itself, or simply as a table of contents in the HTML markup itself,
> with links using Media Fragment URIs to link to different time
> offsets?

Perhaps. That's one approach we can discuss.

However, I believe we're going to need to agree about how various
representations of a media resource are kept syncronized before
resolving this particular issue.

> 
> >          + 2.6 Captioning
> >
> >Text content with the ability to contain hyperlinks, and semantic
> >and style
> >instructions.
> >
> >QUESTION: Are subtitles separate documents? Or are they combined
> >with captions
> >in a single document, in which case multiple documents may be present to
> >support subtitles and captions in various languages, e.g. EN, FR,
> >DE, JP, etc.
> 
> Given that hyperlinks don't exist in any mainstream captioning
> software (that I know of), it can hardly be a requirement unless
> virtually all existing software is insufficient. Personally, I'm not
> thrilled by the potential user experience: seeing a link in the
> captions, moving the mouse towards it, only to have it disappear
> before clicking, possibly accidentally clicking a link from the
> following caption. I think links to related content would be better
> presented alongside the video, not as part of the captions.


I would expect a user would pause media resource playback before
activating a hyperlink.

The user requirements example is to link to glossaries.

The fact that existing captioning authoring tools do, or do not, support
some feature is, imho, beside the point. We're talking about
accessibility to media in the context of hypertext.  Of course we would
want to avail ourselves of useful functionality provided by hypertext
technology.  Conversely, we would not artificially impose limitations
inherent in the analog broadcast media environment to the hypertext
environment. That would just be silly. The authoring tools will simply
need to catch up. They'd no longer be about captions in broadcast alone.

> 
> >          + 2.8 Sign Translation
> >
> >A video "track."
> >
> >Multiple video tracks may be present to support sign translation in
> >various signing languages, e.g. ASL, BSL, NZSL, etc. Note that the
> >example signing languages given here are all translations of English.
> 
> Isn't it also the case that a sign translation track must be decoded
> and rendered on top of the main video track? That makes quite a big
> difference in terms of implementation.


Yes, and we've already agreed not all user agents will support all
accessibility requirements.

On the other hand, you might want to go through this development if you
ever intend to support PIP, for instance.

> 
> >          + 2.9 Transcripts
> >
> >Text content with the ability to contain semantic and style
> >instructions.
> 
> I.e. an HTML document? Transcripts are possible with today's
> technology, right?
> 
We're intentionally format and technology neutral at this point. But,
yes, you're correct except that we also need data to sync this document
to playback of the media resource.

> >         + 3.1 Access to interactive controls / menus
> >
> >An API providing access to:
> >
> >Stop/Start
> >Pause
> 
> We have play() and pause(), but no stop() because it's almost the
> same thing as pause().
> 
> >Fast Forward and Rewind (time based)
> >time-scale modification control
> 
> .playbackRate
> 
> >volume (for each available audio track)
> 
> .volume
> 
> >pan location (for each available audio track)
> >pitch-shift control
> 
> There's no API for these yet.
> 
> >audio filters
> 
> Like Eric, I'm a bit skeptical to this. Why do we need it?

Because it works for people. Specifically, people losing their hearing
can often continue to participate if they can make these kinds of
adjustments. This is known from supporting hard of hearing users in the
telephone world.

> 
> >Next and Previous (structural navigation)
> >Granularity Adjustment Control (Structural Navigation)
> 
> I don't really understand what this is. Would the API be something
> like nextChapter()?


OK. Let me try again.

Let chapters be represented by x. Let sections within chapters be
represented by y. Let subparts of sections be represented by z.

So, now we have three levels, and content of schema x.y.z .

If set at level 2, next and previous would access any x or y, but would
ignore z.

At level 1 they'd ignore y and z, and access only x.

At level 3 they'd access any x, y or z--whichever was next (or
previous).

The granularity control is the control that allows users to shift among
levels one, two, and three. The consequences of next and previous are
defined, as above, by what granularity level the user selects.

Does this help? Please reconsider the Dante example in the user reqs.

> 
> >Viewport content selection, on screen location and sizing control
> 
> Layout is controlled by CSS, other than fullscreen mode we can't
> have a user changing that.
> 
> >Font selection, foreground/background color, bold, etc
> 
> Agree, but as a part of User CSS (no UI).
> 
> >configuration/selection
> >Extended descriptions and extended captions configuration/control
> >Ancillary content configuration/control
> 
I expect we'll discuss this on this weeks' call. Perhaps you could join?

Janina

> I don't know what these last 3 really mean in practice.
> 
> I don't think we should document requirements that are already
> fulfilled (those at the top).
> 
> >          + 3.5 Discovery and activation/deactivation of available
> >alternative content by the user
> >
> >A discovery mechanism and presentation of available media options
> >for user selection.
> 
> Such as a context menu for selecting the audio track? Stating this
> in less cryptic terms would help :)
> 
> >          + 3.8 Requirements on the parallel use of alternate
> >content on potentially multiple devices in parallel
> >
> >A discovery mechanism of available OS provided output device
> >options for user selection.
> 
> I'm not sure what this is about.
> 
> -- 
> Philip Jägenstedt
> Core Developer
> Opera Software

-- 

Janina Sajka,	Phone:	+1.443.300.2200
		sip:janina@asterisk.rednote.net

Chair, Open Accessibility	janina@a11y.org	
Linux Foundation		http://a11y.org

Chair, Protocols & Formats
Web Accessibility Initiative	http://www.w3.org/wai/pf
World Wide Web Consortium (W3C)

Received on Monday, 19 July 2010 17:48:00 UTC