Re: Media--Technical Implications of Our User Requirements

Philip Jägenstedt writes:
> On Mon, 19 Jul 2010 19:47:27 +0200, Janina Sajka
> <janina@rednote.net> wrote:
> 
> >Philip Jägenstedt writes:
> >>Comments inline below, snipped the rest:
> >>
> >>On Wed, 14 Jul 2010 05:51:55 +0200, Janina Sajka
> >><janina@rednote.net> wrote:
> >>
> >>>          + 2.2 Texted Audio Description
> >>>
> >>>Text content with the ability to contain semantic and style
> >>>instructions.
> >>>
> >>>Multiple documents may be present to support texted audio
> >>description in
> >>>various languages, e.g. EN, FR, DE, JP, etc, or to support multiple
> >>>levels of description.
> >>
> >>What semantics and style are required for texted audio descriptions,
> >>specifically? What does "levels of description" mean here?
> >>
> >Texted audio descriptions also pertain to users with low vision or with
> >various learning disabilities. Thus, font overides,
> >foreground/background color, etc., matter.
> >
> >And, while I'm not seeing it in the user reqs at the moment, multiple
> >levels were previous mentioned with respect to different levels of
> >complexity for different users, e.g. descriptions aimed at different
> >school grades (in K-12 content). I don't recall we've decided anything
> >specific regarding levels. I would expect they'd be handled as a
> >separate document.
> 
> In other words, multiple levels of description isn't a technical
> requirement, it's implied by being able to switch between multiple
> text tracks.
> 
Yes. Please remember these are user requirements as distinct from any
technical requirements on user agents, authoring tools, or the HTML 5
specifications themselves.

> >>>          + 2.5 Content Navigation by Content Structure
> >>>
> >>>A structured data file.
> >>>
> >>>NOTE: Data in this file is used to synchronize all media
> >>representations
> >>>available for a given content publication, i.e. whatever audio,
> >>>video, and
> >>>text document--default and alternative--versions may be provided.
> >>
> >>Couldn't the structure be given as chapters of the media resource
> >>itself, or simply as a table of contents in the HTML markup itself,
> >>with links using Media Fragment URIs to link to different time
> >>offsets?
> >
> >Perhaps. That's one approach we can discuss.
> >
> >However, I believe we're going to need to agree about how various
> >representations of a media resource are kept syncronized before
> >resolving this particular issue.
> 
> I sloppily interpreted this as being about the structure in time,
> i.e. chapters and such. Reading it again, it looks like this is
> about a manifest file to treat multiple resources as one. Where does
> such a requirement come from?
> 

I understand it to come from our understanding of the implications of
our user requirements.

We need a term -- I'll say 'primary media resource' to distinguish the
video/audio from any of it's alternative representations.

So, we have a primary media resource and a complex of possible
alternative resources. The alternatives are either test doc (with ml of
some kind), audio and/or video.

For any given primary media resource, we may (or may not) have any of
the alternativeWe need a term -- I'll say 'primary media resource' to
distinguish the video/audio from any of it's alternative
representations.

So, we have a primary media resource and a complex of possible
alternative resources. The alternatives are either test doc (with ml of
some kind), audio and/or video.

For any given primary media resource, we may (or may not) have any of
the alternativess. But, we may also have several of the alternatives.
Whatever the set of primary and alternative representations, whether two
or greater, they must stay fairly well syncronized during play. They
almost stay fairly well sync'd during fast forward, rewind, and when
users navigate to structural nav points.

I say "fairly well." That's a term we'll need to define technically. I
suspect lag times of a 100ms or so will not be problematic. I'm pretty
sure lag times in excess of 200 ms will be problematic. But, these are
my views. We haven't discussed this yet.


The point is that we do need to keep multiple representations of media
syncronized. That may indeed require a manifest type controller. I
believe it would make a more robust and reliable syncronization
mechanism.

FYI: We began discussing this on last Wednesday's Media call, and are
likely to touch on it again this week.

> >>>          + 2.6 Captioning
> >>>
> >>>Text content with the ability to contain hyperlinks, and semantic
> >>>and style
> >>>instructions.
> >>>
> >>>QUESTION: Are subtitles separate documents? Or are they combined
> >>>with captions
> >>>in a single document, in which case multiple documents may be
> >>present to
> >>>support subtitles and captions in various languages, e.g. EN, FR,
> >>>DE, JP, etc.
> >>
> >>Given that hyperlinks don't exist in any mainstream captioning
> >>software (that I know of), it can hardly be a requirement unless
> >>virtually all existing software is insufficient. Personally, I'm not
> >>thrilled by the potential user experience: seeing a link in the
> >>captions, moving the mouse towards it, only to have it disappear
> >>before clicking, possibly accidentally clicking a link from the
> >>following caption. I think links to related content would be better
> >>presented alongside the video, not as part of the captions.
> >
> >
> >I would expect a user would pause media resource playback before
> >activating a hyperlink.
> >
> >The user requirements example is to link to glossaries.
> >
> >The fact that existing captioning authoring tools do, or do not, support
> >some feature is, imho, beside the point. We're talking about
> >accessibility to media in the context of hypertext.  Of course we would
> >want to avail ourselves of useful functionality provided by hypertext
> >technology.  Conversely, we would not artificially impose limitations
> >inherent in the analog broadcast media environment to the hypertext
> >environment. That would just be silly. The authoring tools will simply
> >need to catch up. They'd no longer be about captions in broadcast alone.
> 
> I should have said "media players", not "captioning software". When
> I tested the YouTube example that Philippe provided, what happened
> was that I was too slow to click and the video just ended. While I'm
> sure that there are cases where the user experience could be made
> less frustrating, I don't think this needs to be built in to the
> captioning format itself, let people build it with the tools at
> hand. (It's possible with <video> + scripts today.)
> 
I agree.

> >>>          + 2.8 Sign Translation
> >>>
> >>>A video "track."
> >>>
> >>>Multiple video tracks may be present to support sign translation in
> >>>various signing languages, e.g. ASL, BSL, NZSL, etc. Note that the
> >>>example signing languages given here are all translations of English.
> >>
> >>Isn't it also the case that a sign translation track must be decoded
> >>and rendered on top of the main video track? That makes quite a big
> >>difference in terms of implementation.
> >
> >
> >Yes, and we've already agreed not all user agents will support all
> >accessibility requirements.
> >
> >On the other hand, you might want to go through this development if you
> >ever intend to support PIP, for instance.
> 
> I would probably simply overlay the sign translation video on top of
> the main video. What's missing in this is syncing two video
> elements. Given that sign language probably doesn't need to be
> accurate within more than a few hundred milliseconds, this could be
> done with scripts today.
> 
I'll take an action item to find out whether there's any good research
data on how well sync'd the two should be. You're likely correct,
though, especially as this really is translation between two languages,
so word order isn't a 1::1 correspondance.

I don't know about putting signing on top of. It may matter what you
hide. We do have a user requirement to allow users to position where on
screen, and how much of the screen is used by the alternative media.

> >>>          + 2.9 Transcripts
> >>>
> >>>Text content with the ability to contain semantic and style
> >>>instructions.
> >>
> >>I.e. an HTML document? Transcripts are possible with today's
> >>technology, right?
> >>
> >We're intentionally format and technology neutral at this point. But,
> >yes, you're correct except that we also need data to sync this document
> >to playback of the media resource.
> 
> Transcripts aren't timed, are they? If they are, what's the
> difference with between captions and transcripts?
> 
Target audience for transcripts is people who hear, but comprehend much
better when they can also see text of what they're hearing at the same
time as they hear it.

So, the most sophisticated user agent might highlight the words being
spoken, as they're spoken.

Captions are aimed at people who don't hear, or don't hear very well
anymore. So, they contain info about other audio events (laughter,
slamming doors, revving engines, amublance sirens, etc.) in addition to
the words being spoken.

If I may again interject personal opinion, I wonder whether there's gain
to combining subtitle, caption, and transcript into a single file. All
of these are based on what is being spoken. Subtitle and transcript may
not differ much beyond that, except in how it's fed onto the screen--a
look and feel thing. We'll need to discuss that. Caption definitely
contains more data, but that data, the sirens and slamming doors, might
perhaps be stored in text strings identified by an attribute descriptor
which the user agent can use or ignore, depending on whether caption is
the selected alternative media the user has requested. Just a thought.

> >>>         + 3.1 Access to interactive controls / menus
> >>>
> >>>audio filters
> >>
> >>Like Eric, I'm a bit skeptical to this. Why do we need it?
> >
> >Because it works for people. Specifically, people losing their hearing
> >can often continue to participate if they can make these kinds of
> >adjustments. This is known from supporting hard of hearing users in the
> >telephone world.
> 
> Is there software that does this today? Does it need to be built in
> to the browser, can't it equally well work on the operating system
> level, acting as an audio output device that applies some filters
> and passes it on to another audio device?
> 
I haven't found any software expressly for this. I do see hardware
devices.

However, I'm increasingly of the opinion that this may be better handled
in the OS--or even in hardware controls on external audio devices. It's
something to discuss, certainly.

> >>>Next and Previous (structural navigation)
> >>>Granularity Adjustment Control (Structural Navigation)
> >>
> >>I don't really understand what this is. Would the API be something
> >>like nextChapter()?
> >
> >
> >OK. Let me try again.
> >
> >Let chapters be represented by x. Let sections within chapters be
> >represented by y. Let subparts of sections be represented by z.
> >
> >So, now we have three levels, and content of schema x.y.z .
> >
> >If set at level 2, next and previous would access any x or y, but would
> >ignore z.
> >
> >At level 1 they'd ignore y and z, and access only x.
> >
> >At level 3 they'd access any x, y or z--whichever was next (or
> >previous).
> >
> >The granularity control is the control that allows users to shift among
> >levels one, two, and three. The consequences of next and previous are
> >defined, as above, by what granularity level the user selects.
> >
> >Does this help? Please reconsider the Dante example in the user reqs.
> 
> I think I understand. The structure is a tree, much like a nested
> sections of a TOC. Is there any existing software that supports
> navigation based on such a TOC, so that we can have a look at how
> it's supposed to work? If screen readers are already clever about
> navigated nested HTML lists, perhaps something like this is best
> realized as a bunch of links in a list?


You're correct. And there are software implementations. Because this
kind of navigation was pioneered (and perfected) for DAISY books, the
software implements SMIL. A good starting point is the DAISY web site,
daisy.org.

PS: I know SMIL is a problematic technology in an HTML 5 context. Let me
hasten to add that I'm simply pointing out that SMIL is what DAISY used
to accomplish the structnav user req. I expect there are other ways to
achieve it as well. I expect we will certainly discuss the merits of various
approaches.

> 
> >>>Viewport content selection, on screen location and sizing control
> >>
> >>Layout is controlled by CSS, other than fullscreen mode we can't
> >>have a user changing that.
> >>
> >>>Font selection, foreground/background color, bold, etc
> >>
> >>Agree, but as a part of User CSS (no UI).
> >>
> >>>configuration/selection
> >>>Extended descriptions and extended captions configuration/control
> >>>Ancillary content configuration/control
> >>
> >I expect we'll discuss this on this weeks' call. Perhaps you could join?
> 
> I always prefer email, but maybe. I can't see an announcement of the
> date and time yet, though.

Understood. I think our telecons have been productive, though, if
sometimes frustrating. But then frustration is possible from any
communication mechanism, isn't it?

John is often late getting agendas out. Basically, I expect we'll talk
through the controls of my tech implications draft this week.

Call is Wednesdays at 22:00 UTC for 90 minutes at the usual W3C numbers.
IRC is #html-a11y and zakim code is 2119# (which spells 'a11y' ).

Janina

> 
> -- 
> Philip Jägenstedt
> Core Developer
> Opera Software

-- 

Janina Sajka,	Phone:	+1.443.300.2200
		sip:janina@asterisk.rednote.net

Chair, Open Accessibility	janina@a11y.org	
Linux Foundation		http://a11y.org

Chair, Protocols & Formats
Web Accessibility Initiative	http://www.w3.org/wai/pf
World Wide Web Consortium (W3C)

Received on Tuesday, 20 July 2010 21:46:50 UTC