Re: timing model of the media resource in HTML5

On Wed, 25 Nov 2009 14:29:37 +0100, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> Hi Philip, all,
>
> See comments below inline.
>
> On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com>  
> wrote:
>>
>> I agree that syncing separate video and audio files is a big challenge.  
>> I'd
>> prefer leaving this kind of complexity either to scripting or an  
>> external
>> manifest like SMIL.
>
> We have to at minimum deal with multi-track video and audio files
> inside HTML, since they can potentially expose accessibility data:
> audio descriptions (read by a human), sign language (signed by a
> person), and captions are the particular tracks I am concerned about.

I agree and think that the tracks of the resource should be exposed via a  
DOM API. From a scripts point of view it should look the same whether the  
resource is Ogg, MPEG-4 or SMIL linking several tracks together.

> There is also always the needs for different recording angles, but
> let's leave that to javascript, where the whole media resource is
> exchanged. Similarly, when we deal with different devices, we can also
> exchange the complete media resource markup.
>
> So, focusing on a video with a + v + audio description + sign language
> track + caption track, we still need to expose these tracks to the Web
> browser to decide based on user preference settings whether to display
> them or not. This is on top of and beyond the <itext> proposals I have
> previously discussed.
>
> The Google accessibility experts wanted at least the in-line caption
> tracks exposed in declarative language. This is because otherwise you
> cannot build a menu of all available tracks without having to start
> downloading and decoding the file.  With this in mind, I think we have
> to expose all of the tracks available in a file in declarative
> language.
>

Who is building the menu? I really don't see a problem with waiting until  
metadataloaded for the menu to be available. Picking a language in the < 1  
sec before that seems like a fringe use case which can be solved by  
sending the information in an site-specific format using data-* attributes  
or similar.

>> Below I focus on the HTML-specific parts:
>>
>> Captions/subtitles... The main problem of reusing <source> is that it
>> doesn't work with the resource selection algorithm.[1]
>
> Yes, I have noticed that problem, too. The resource selection
> algorithm regards all of the <source> elements as alternatives to each
> other.
>
>> However, that
>> algorithm only considers direct children of the media element, so  
>> adding a
>> wrapping element would solve this problem and allow us to spec different
>> rules for selecting timed-text sources. Example:
>>
>> <video>
>>  <source src="video.ogg" type="video/ogg">
>>  <source src="video.mp4" type="video/mp4">
>>  <overlay>
>>    <source src="en.srt" lang="en-US">
>>    <source src="hans.srt" lang="zh-CN">
>>  </overlay>
>> </video>
>
> Yes, this works for external additional tracks. Maybe then we can add
> the internal tracks inside the source elements, something like this:
>
>  <video>
>   <source src="video.ogg" type="video/ogg">
>     <track id='v' role='video' ref='serialno:1505760010'>
>     <track id='a' role='audio' lang='en' ref='serialno:0821695999'>
>     <track id='ad' role='auddesc' lang='en' ref='serialno:1421614520'>
>     <track id='s' role='sign' lang='ase' ref='serialno:1413244634'>
>     <track id='cc' role='caption' lang='en' ref='serialno:1421849818'>
>   </source>
>   <source src="video.mp4" type="video/mp4">
>     <track id='v' role='video' ref='trackid:1'>
>     <track id='a' role='audio' lang='en' ref='trackid:2'>
>   </source>
>   <overlay>
>     <source src="en.srt" lang="en-US">
>     <source src="hans.srt" lang="zh-CN">
>   </overlay>
>  </video>
>
> Note I have made the track reference explicit through introducing a
> new "ref" attribute which uses encapsulation format specific
> references to track identifiers.
>

<source> is a void element, so this markup does not degrade nicely in any  
shipped <video>-capable browsers. Try  
<http://software.hixie.ch/utilities/js/live-dom-viewer/saved/318>. Firefox  
puts the second <source> element inside nested <track> elements and Safari  
just drops it.

That aside, I'm not convinced this is actually needed, as per above and  
agree with what Eric Carlson said.

>> We could possibly allow <overlay src="english.srt"></overlay> as a  
>> shorthand
>> when there is only one captions file, just like the video <video
>> src=""></video> shorthand.
>>
>> I'm suggesting <overlay> instead of e.g. <itext> because I have some  
>> special
>> behavior in mind: when no (usable) source is found in <overlay>, the  
>> content
>> of the element should be displayed overlayed on top of the video  
>> element as
>> if it were inside a CSS box of the same size as the video. This gives
>> authors a simple way to display overlay content such as custom controls  
>> and
>> complex "subtitles" like animated karaoke to work the same both in  
>> normal
>> rendering and in fullscreen mode. (I don't know what kind of CSS spec  
>> magic
>> would be needed to allow such rendering, but I don't believe overlaying  
>> the
>> content is very difficult implementation-wise.)
>>
>> Naturally, CSS is used to style the captions:
>>
>> <video src="video.ogg">
>>  <overlay src="en.srt"
>> style="font-size:2em;padding:1em;text-align:center"></overlay>
>> </video>
>>
>> If there is a use case, displaying several captions/subtitles at once  
>> could
>> be allowed as such:
>>
>> <video src="video.ogg">
>>  <overlay src="en.srt" class="centerTop"></overlay>
>>  <overlay src="hans.srt" class="centerBottom"></overlay>
>> </video>
>
> Ah yes, that is replicating the hierarchical approach I took with
> itextlist / itext.[2] They could also be more generic text than just
> subtitles and captions - in particular textual audio descriptions have
> been confirmed at TPAC to be very useful indeed.
>

Sibling <overlay>s with <source> children make at most a hierarchy in 2  
levels, but sure. Anything that can be displayed graphically is suitable  
for <overlay>, although the natively supported formats will probably be  
limited to timed text via SRT and maybe something more complex like DFXP.

>> centerTop/centerBottom are appropriately defined in CSS.
>
> Those are almost like the default styling approaches I suggested for
> itextlist / itext.[2] There, I also assumed there was a display area
> as large as the video or actually just a little larger available to
> render the time-aligend text into. It's larger since sometimes it is
> better not to overlay stuff but to place it right next to the video,
> e.g. just above it (title-like) or just below it but visually part of
> the video window.
>

Just to be clear, centerTop/centerBottom are user defined, nothing magic.  
As for the default stylesheet for <overlay> I'm not sure, maybe just  
"display:box" and the rest should be defined by the user.

>> For what it's worth, it's easy to get this behavior (sans fullscreen)  
>> using
>> scripting today, simply by cloning/moving the overlay elements outside  
>> of
>> <vide> and positioning them on top using CSS. Even SRT retrieval (XHR),
>> decoding (RegExp) and syncing (timeupdate event) is easy enough to do.
>
> It's indeed how I implemented the demos [3]. E.g.
> http://www.annodex.net/~silvia/itext/elephant_no_skin_v2.html has divs
> defined just outside the video element, but styled to sit directly
> over the video. Is this something that we would need to declare
> explicitly into the DOM or would that be something that the browser
> can introduce at that position and expose to the DOM. Without the DOM
> exposure, there is no adaptive styling.
>

Yes, that's where I got the idea, sorry for not linking...

>> Comments?
>
> I think your ideas re CSS are great! I am as yet unsure how that can
> be solved in the browser, so any ideas are very much welcome.
>

I think it's mainly a spec problem, not an implementation problem. For  
implementation one would simply render the CSS box for <overlay>  
separately and then blit that on top of the video, whether it is in  
fullscreen or not. When switching to fullscreen the <overlay> box would be  
resized to the size of the screen of course, and possibly  
media="projection" should then apply to make it possible to use different  
CSS in fullscreen and normal view.

> Cheers,
> Silvia.
>
> [2] https://wiki.mozilla.org/Accessibility/HTML5_captions_v2
> [3] http://www.annodex.net/~silvia/itext/
>
>> [1]
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-algorithm
>


-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Wednesday, 25 November 2009 16:27:24 UTC