Re: Media--Technical Implications of Our User Requirements from Philip Jägenstedt on 2010-07-20 (public-html-a11y@w3.org from July 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 20 Jul 2010 13:58:35 +0200
To: public-html-a11y@w3.org, "Janina Sajka" <janina@rednote.net>
Message-ID: <op.vf44vxrsatwj1d@philip-pc>
On Mon, 19 Jul 2010 19:47:27 +0200, Janina Sajka <janina@rednote.net>  
wrote:

> Philip Jägenstedt writes:
>> Comments inline below, snipped the rest:
>>
>> On Wed, 14 Jul 2010 05:51:55 +0200, Janina Sajka
>> <janina@rednote.net> wrote:
>>
>> >          + 2.2 Texted Audio Description
>> >
>> >Text content with the ability to contain semantic and style
>> >instructions.
>> >
>> >Multiple documents may be present to support texted audio description  
>> in
>> >various languages, e.g. EN, FR, DE, JP, etc, or to support multiple
>> >levels of description.
>>
>> What semantics and style are required for texted audio descriptions,
>> specifically? What does "levels of description" mean here?
>>
> Texted audio descriptions also pertain to users with low vision or with
> various learning disabilities. Thus, font overides,
> foreground/background color, etc., matter.
>
> And, while I'm not seeing it in the user reqs at the moment, multiple
> levels were previous mentioned with respect to different levels of
> complexity for different users, e.g. descriptions aimed at different
> school grades (in K-12 content). I don't recall we've decided anything
> specific regarding levels. I would expect they'd be handled as a
> separate document.

In other words, multiple levels of description isn't a technical  
requirement, it's implied by being able to switch between multiple text  
tracks.

>> >          + 2.5 Content Navigation by Content Structure
>> >
>> >A structured data file.
>> >
>> >NOTE: Data in this file is used to synchronize all media  
>> representations
>> >available for a given content publication, i.e. whatever audio,
>> >video, and
>> >text document--default and alternative--versions may be provided.
>>
>> Couldn't the structure be given as chapters of the media resource
>> itself, or simply as a table of contents in the HTML markup itself,
>> with links using Media Fragment URIs to link to different time
>> offsets?
>
> Perhaps. That's one approach we can discuss.
>
> However, I believe we're going to need to agree about how various
> representations of a media resource are kept syncronized before
> resolving this particular issue.

I sloppily interpreted this as being about the structure in time, i.e.  
chapters and such. Reading it again, it looks like this is about a  
manifest file to treat multiple resources as one. Where does such a  
requirement come from?

>> >          + 2.6 Captioning
>> >
>> >Text content with the ability to contain hyperlinks, and semantic
>> >and style
>> >instructions.
>> >
>> >QUESTION: Are subtitles separate documents? Or are they combined
>> >with captions
>> >in a single document, in which case multiple documents may be present  
>> to
>> >support subtitles and captions in various languages, e.g. EN, FR,
>> >DE, JP, etc.
>>
>> Given that hyperlinks don't exist in any mainstream captioning
>> software (that I know of), it can hardly be a requirement unless
>> virtually all existing software is insufficient. Personally, I'm not
>> thrilled by the potential user experience: seeing a link in the
>> captions, moving the mouse towards it, only to have it disappear
>> before clicking, possibly accidentally clicking a link from the
>> following caption. I think links to related content would be better
>> presented alongside the video, not as part of the captions.
>
>
> I would expect a user would pause media resource playback before
> activating a hyperlink.
>
> The user requirements example is to link to glossaries.
>
> The fact that existing captioning authoring tools do, or do not, support
> some feature is, imho, beside the point. We're talking about
> accessibility to media in the context of hypertext.  Of course we would
> want to avail ourselves of useful functionality provided by hypertext
> technology.  Conversely, we would not artificially impose limitations
> inherent in the analog broadcast media environment to the hypertext
> environment. That would just be silly. The authoring tools will simply
> need to catch up. They'd no longer be about captions in broadcast alone.

I should have said "media players", not "captioning software". When I  
tested the YouTube example that Philippe provided, what happened was that  
I was too slow to click and the video just ended. While I'm sure that  
there are cases where the user experience could be made less frustrating,  
I don't think this needs to be built in to the captioning format itself,  
let people build it with the tools at hand. (It's possible with <video> +  
scripts today.)

>> >          + 2.8 Sign Translation
>> >
>> >A video "track."
>> >
>> >Multiple video tracks may be present to support sign translation in
>> >various signing languages, e.g. ASL, BSL, NZSL, etc. Note that the
>> >example signing languages given here are all translations of English.
>>
>> Isn't it also the case that a sign translation track must be decoded
>> and rendered on top of the main video track? That makes quite a big
>> difference in terms of implementation.
>
>
> Yes, and we've already agreed not all user agents will support all
> accessibility requirements.
>
> On the other hand, you might want to go through this development if you
> ever intend to support PIP, for instance.

I would probably simply overlay the sign translation video on top of the  
main video. What's missing in this is syncing two video elements. Given  
that sign language probably doesn't need to be accurate within more than a  
few hundred milliseconds, this could be done with scripts today.

>> >          + 2.9 Transcripts
>> >
>> >Text content with the ability to contain semantic and style
>> >instructions.
>>
>> I.e. an HTML document? Transcripts are possible with today's
>> technology, right?
>>
> We're intentionally format and technology neutral at this point. But,
> yes, you're correct except that we also need data to sync this document
> to playback of the media resource.

Transcripts aren't timed, are they? If they are, what's the difference  
with between captions and transcripts?

>> >         + 3.1 Access to interactive controls / menus
>> >
>> >audio filters
>>
>> Like Eric, I'm a bit skeptical to this. Why do we need it?
>
> Because it works for people. Specifically, people losing their hearing
> can often continue to participate if they can make these kinds of
> adjustments. This is known from supporting hard of hearing users in the
> telephone world.

Is there software that does this today? Does it need to be built in to the  
browser, can't it equally well work on the operating system level, acting  
as an audio output device that applies some filters and passes it on to  
another audio device?

>> >Next and Previous (structural navigation)
>> >Granularity Adjustment Control (Structural Navigation)
>>
>> I don't really understand what this is. Would the API be something
>> like nextChapter()?
>
>
> OK. Let me try again.
>
> Let chapters be represented by x. Let sections within chapters be
> represented by y. Let subparts of sections be represented by z.
>
> So, now we have three levels, and content of schema x.y.z .
>
> If set at level 2, next and previous would access any x or y, but would
> ignore z.
>
> At level 1 they'd ignore y and z, and access only x.
>
> At level 3 they'd access any x, y or z--whichever was next (or
> previous).
>
> The granularity control is the control that allows users to shift among
> levels one, two, and three. The consequences of next and previous are
> defined, as above, by what granularity level the user selects.
>
> Does this help? Please reconsider the Dante example in the user reqs.

I think I understand. The structure is a tree, much like a nested sections  
of a TOC. Is there any existing software that supports navigation based on  
such a TOC, so that we can have a look at how it's supposed to work? If  
screen readers are already clever about navigated nested HTML lists,  
perhaps something like this is best realized as a bunch of links in a list?

>> >Viewport content selection, on screen location and sizing control
>>
>> Layout is controlled by CSS, other than fullscreen mode we can't
>> have a user changing that.
>>
>> >Font selection, foreground/background color, bold, etc
>>
>> Agree, but as a part of User CSS (no UI).
>>
>> >configuration/selection
>> >Extended descriptions and extended captions configuration/control
>> >Ancillary content configuration/control
>>
> I expect we'll discuss this on this weeks' call. Perhaps you could join?

I always prefer email, but maybe. I can't see an announcement of the date  
and time yet, though.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Tuesday, 20 July 2010 11:59:47 UTC