Re: timing model of the media resource in HTML5 from Silvia Pfeiffer on 2009-11-25 (public-html-a11y@w3.org from November 2009)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 26 Nov 2009 00:29:37 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <2c0e02830911250529m31415be8l81ae28bb77b88224@mail.gmail.com>
Hi Philip, all,

See comments below inline.

On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> wrote:
>
> I agree that syncing separate video and audio files is a big challenge. I'd
> prefer leaving this kind of complexity either to scripting or an external
> manifest like SMIL.

We have to at minimum deal with multi-track video and audio files
inside HTML, since they can potentially expose accessibility data:
audio descriptions (read by a human), sign language (signed by a
person), and captions are the particular tracks I am concerned about.

There is also always the needs for different recording angles, but
let's leave that to javascript, where the whole media resource is
exchanged. Similarly, when we deal with different devices, we can also
exchange the complete media resource markup.

So, focusing on a video with a + v + audio description + sign language
track + caption track, we still need to expose these tracks to the Web
browser to decide based on user preference settings whether to display
them or not. This is on top of and beyond the <itext> proposals I have
previously discussed.

The Google accessibility experts wanted at least the in-line caption
tracks exposed in declarative language. This is because otherwise you
cannot build a menu of all available tracks without having to start
downloading and decoding the file.  With this in mind, I think we have
to expose all of the tracks available in a file in declarative
language.


> Below I focus on the HTML-specific parts:
>
> Captions/subtitles... The main problem of reusing <source> is that it
> doesn't work with the resource selection algorithm.[1]

Yes, I have noticed that problem, too. The resource selection
algorithm regards all of the <source> elements as alternatives to each
other.

> However, that
> algorithm only considers direct children of the media element, so adding a
> wrapping element would solve this problem and allow us to spec different
> rules for selecting timed-text sources. Example:
>
> <video>
>  <source src="video.ogg" type="video/ogg">
>  <source src="video.mp4" type="video/mp4">
>  <overlay>
>    <source src="en.srt" lang="en-US">
>    <source src="hans.srt" lang="zh-CN">
>  </overlay>
> </video>

Yes, this works for external additional tracks. Maybe then we can add
the internal tracks inside the source elements, something like this:

 <video>
  <source src="video.ogg" type="video/ogg">
    <track id='v' role='video' ref='serialno:1505760010'>
    <track id='a' role='audio' lang='en' ref='serialno:0821695999'>
    <track id='ad' role='auddesc' lang='en' ref='serialno:1421614520'>
    <track id='s' role='sign' lang='ase' ref='serialno:1413244634'>
    <track id='cc' role='caption' lang='en' ref='serialno:1421849818'>
  </source>
  <source src="video.mp4" type="video/mp4">
    <track id='v' role='video' ref='trackid:1'>
    <track id='a' role='audio' lang='en' ref='trackid:2'>
  </source>
  <overlay>
    <source src="en.srt" lang="en-US">
    <source src="hans.srt" lang="zh-CN">
  </overlay>
 </video>

Note I have made the track reference explicit through introducing a
new "ref" attribute which uses encapsulation format specific
references to track identifiers.


> We could possibly allow <overlay src="english.srt"></overlay> as a shorthand
> when there is only one captions file, just like the video <video
> src=""></video> shorthand.
>
> I'm suggesting <overlay> instead of e.g. <itext> because I have some special
> behavior in mind: when no (usable) source is found in <overlay>, the content
> of the element should be displayed overlayed on top of the video element as
> if it were inside a CSS box of the same size as the video. This gives
> authors a simple way to display overlay content such as custom controls and
> complex "subtitles" like animated karaoke to work the same both in normal
> rendering and in fullscreen mode. (I don't know what kind of CSS spec magic
> would be needed to allow such rendering, but I don't believe overlaying the
> content is very difficult implementation-wise.)
>
> Naturally, CSS is used to style the captions:
>
> <video src="video.ogg">
>  <overlay src="en.srt"
> style="font-size:2em;padding:1em;text-align:center"></overlay>
> </video>
>
> If there is a use case, displaying several captions/subtitles at once could
> be allowed as such:
>
> <video src="video.ogg">
>  <overlay src="en.srt" class="centerTop"></overlay>
>  <overlay src="hans.srt" class="centerBottom"></overlay>
> </video>

Ah yes, that is replicating the hierarchical approach I took with
itextlist / itext.[2] They could also be more generic text than just
subtitles and captions - in particular textual audio descriptions have
been confirmed at TPAC to be very useful indeed.


> centerTop/centerBottom are appropriately defined in CSS.

Those are almost like the default styling approaches I suggested for
itextlist / itext.[2] There, I also assumed there was a display area
as large as the video or actually just a little larger available to
render the time-aligend text into. It's larger since sometimes it is
better not to overlay stuff but to place it right next to the video,
e.g. just above it (title-like) or just below it but visually part of
the video window.


> For what it's worth, it's easy to get this behavior (sans fullscreen) using
> scripting today, simply by cloning/moving the overlay elements outside of
> <vide> and positioning them on top using CSS. Even SRT retrieval (XHR),
> decoding (RegExp) and syncing (timeupdate event) is easy enough to do.

It's indeed how I implemented the demos [3]. E.g.
http://www.annodex.net/~silvia/itext/elephant_no_skin_v2.html has divs
defined just outside the video element, but styled to sit directly
over the video. Is this something that we would need to declare
explicitly into the DOM or would that be something that the browser
can introduce at that position and expose to the DOM. Without the DOM
exposure, there is no adaptive styling.

> Comments?

I think your ideas re CSS are great! I am as yet unsure how that can
be solved in the browser, so any ideas are very much welcome.

Cheers,
Silvia.

[2] https://wiki.mozilla.org/Accessibility/HTML5_captions_v2
[3] http://www.annodex.net/~silvia/itext/

> [1]
> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-algorithm
Received on Wednesday, 25 November 2009 13:30:31 UTC