Re: timing model of the media resource in HTML5

On Mon, Nov 23, 2009 at 1:02 PM, Silvia Pfeiffer wrote:
> 


> As a three sentence summary:
> Basically, I believe that the 90% use case for the Web is that of a
> time-linear media resource. Any other, more complex needs, that
> require multiple timelines can be realised using JavaScript and the
> APIs to audio and video that we still need to define and that will
> expose companion tracks to the Web page and therefore to JavaScript. I
> don't believe that there will be many use cases that such a
> combination cannot satisfy, but if there are, one can always use the
> "object" tag and use external plugins to render the Adobe Flash,
> Silverlight or SMIL experience to produce this.
> 
> BTW: talking about SMIL - I would be very curious to find out if
> somebody has tried implementing SMIL in HTML5 and JavaScript yet. I
> think much of what a SMIL file defines should now be able to be
> presentable in a Web Browser using existing HTML5 and JavaScript
> constructs. It would be an interesting exercise and I'd be curious to
> hear if somebody has tried and where they found limitations.
> 

  While it is possible to implement some (many?) of the simplest SMIL constructs using HTML, CSS, and JavaScript, things will fall apart as soon as you try to do anything that requires synchronization of media elements (eg. a <par> containing <audio> and <video>). The latency between the JavaScript context and the media engine is just too high to synchronize starting and stopping multiple elements, never mind the clipping and sync behavior attributes.


On Nov 25, 2009, at 5:29 AM, Silvia Pfeiffer wrote:

> 
> On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> wrote:
>> 
>> I agree that syncing separate video and audio files is a big challenge. I'd
>> prefer leaving this kind of complexity either to scripting or an external
>> manifest like SMIL.
> 
> We have to at minimum deal with multi-track video and audio files
> inside HTML, since they can potentially expose accessibility data:
> audio descriptions (read by a human), sign language (signed by a
> person), and captions are the particular tracks I am concerned about.
> 
  I agree that we must support multi-track audio and video files. If a container format permits references to external media files (as QuickTime does) it is the job of the media engine to keep them in sync so we don't need to worry about it.


> The Google accessibility experts wanted at least the in-line caption
> tracks exposed in declarative language. This is because otherwise you
> cannot build a menu of all available tracks without having to start
> downloading and decoding the file.  With this in mind, I think we have
> to expose all of the tracks available in a file in declarative
> language.
> 

> 
> Yes, this works for external additional tracks. Maybe then we can add
> the internal tracks inside the source elements, something like this:
> 
> <video>
>  <source src="video.ogg" type="video/ogg">
>    <track id='v' role='video' ref='serialno:1505760010'>
>    <track id='a' role='audio' lang='en' ref='serialno:0821695999'>
>    <track id='ad' role='auddesc' lang='en' ref='serialno:1421614520'>
>    <track id='s' role='sign' lang='ase' ref='serialno:1413244634'>
>    <track id='cc' role='caption' lang='en' ref='serialno:1421849818'>
>  </source>
>  <source src="video.mp4" type="video/mp4">
>    <track id='v' role='video' ref='trackid:1'>
>    <track id='a' role='audio' lang='en' ref='trackid:2'>
>  </source>
>  <overlay>
>    <source src="en.srt" lang="en-US">
>    <source src="hans.srt" lang="zh-CN">
>  </overlay>
> </video>
> 
> Note I have made the track reference explicit through introducing a
> new "ref" attribute which uses encapsulation format specific
> references to track identifiers.
> 
  I *really* don't like the idea of requiring page authors to declare the track structure in the markup. It seems to me that because it will require new specialized tools get the information, and because it will be really difficult to do correctly (ten digit serial numbers?), people are likely to just skip it completely. We need to create a specification that makes it as simple as possible for people to do the right thing.

  If we do allow this, what happens when the structure declared in the markup differs from the structure of the media file?


> On Wed, Nov 25, 2009 at 11:24 PM, Philip Jägenstedt <philipj@opera.com> wrote:
>> 
>> centerTop/centerBottom are appropriately defined in CSS.
> 
> Those are almost like the default styling approaches I suggested for
> itextlist / itext.[2] There, I also assumed there was a display area
> as large as the video or actually just a little larger available to
> render the time-aligend text into. It's larger since sometimes it is
> better not to overlay stuff but to place it right next to the video,
> e.g. just above it (title-like) or just below it but visually part of
> the video window.
> 
  We shouldn't make assumptions about the size of an overlay, if someone want to display something outside of the media element's bounds they can use CSS.

eric

Received on Wednesday, 25 November 2009 16:02:03 UTC