Re: Survey ready on Media Multitrack API proposal from Silvia Pfeiffer on 2010-03-16 (public-html-a11y@w3.org from March 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 16 Mar 2010 21:44:06 +1100
To: Dick Bulterman <Dick.Bulterman@cwi.nl>
Cc: Philip Jägenstedt <philipj@opera.com>, "Michael(tm) Smith" <mike@w3.org>, HTML Accessibility Task Force <public-html-a11y@w3.org>
Message-ID: <2c0e02831003160344y58bd7f63i3426eb84aac98fb1@mail.gmail.com>
Hi Dick,

Thanks for the reply - and I am sorry if I am repeating myself.
However, what you are intending to do is not what the video and audio
elements were built for. It is not our place in the accessibility task
force to change how audio and video work - we only deal with creating
accessibility for these elements.


On Tue, Mar 16, 2010 at 8:26 PM, Dick Bulterman <Dick.Bulterman@cwi.nl> wrote:
>
> First, I understand the urge to manage a complex problem with a simple
> solution. The complex problem is synchronizing media objects that have
> different internal time bases -- such as when the objects live in
> separate files.
>
> You write:
>>
>> There is no difference if these tracks come from within a file or from
>> external, except that with external tracks we have all the data
>> available at once, while with internal tracks we get them in chunks
>> together with the media data. In either case you have a pipeline for
>> decoding the video, a pipeline for decoding the audio and a "pipeline"
>> for decoding the text and your media system synchronises them. The
>> need for temporal and spatial composition is not different whether the
>> text comes from within the file as when it comes from external. In
>> fact, it is easier to deal with the external case because its data is
>> available fully from the start and temporal synchronisiation, seeking
>> etc is easier to solve.
>
> Thanks for the primer on basic synchronization,

Going into detail here seems important - not just so you and I can
pinpoint exactly where our understanding is, but also such that others
are able to follow.


> but what you say is not
> really true: there is a fundamental difference between managing content
> within a single container and across independent objects. When all media
> is in one container, you can get away with making lots of simplifying
> assumptions on timing. For example, if you are streaming the content and
> there is a network delay, ALL of the media is delayed. This is not true
> if they are in separate containers, coming from separate sources (on
> separate servers, etc.) In the case of separate files, if there is a
> delay in one object (say the video), you need to know:
> - who is the sync master
> - how tight is the synchronization relationship

What I have tried to point out and what is exactly the basic
misunderstanding here is that these questions are already answered for
the audio and video elements. There is a determined sync master:
namely the video's source, and everything synchronises to it.
Therefore, from a HTML viewpoint, there is no difference in managing
content from a single container or across the externally associated
objects, because the externally associated objects are not independent
objects.

What you are working towards really does need a new element in HTML5.


>> Applications such as VLC, mplayer and many
>> others that are able to synchronise external captions with media
>> resources have shown it for years. They do not need a sophisticated
>> framework for activating objects from multiple sources.
>
> One of the reasons that this works for VLC is that they are NOT also
> rendering a full page's content: the entire temporal scope is restricted
> to only the video -- that's all they do. An HTML5 browser does a whole
> lot more (such as perhaps managing multiple <video>/<audio> elements on
> a page). It is also the reason that you can only do timeline manipulation in
> VLC: there is no structure, so you can't do content-based navigation,
> selective content inclusion, or any kind of adaptability for people with
> different needs. A missed opportunity.

These features are also not what the HTML5 media elements are built
for. They do in fact not much more than a media player and are not
meant to do much more. There are advantages for being part of the Web,
in that e.g. direct addressing to content inside the elements are
possible and thus content-based navigation is enabled. However, this
functionality does not come from creating a flexible synchronisation
container around the element, but through the availability of URLs
inside a Web browser.

Anything that is about composing media resources together in a
flexible way, about selective inclusion and flexible timing in the way
that the SMIL elements provide it is a canvas-type approach. There is
no explicit element that allows timeline synchronisation functionality
across multiple media objects - audio and video certainly aren't it.
If you see a requirement for such elements, please go ahead and make a
proposal to the HTML WG to have such an element built on top of the
existing media elements. It is not our place here to change the
functionality of the existing media elements or to introduce new
multimedia functionality into HTML5.


>>> > As an aside: One of the benefits of this approach is that it means that
>>> > you
>>> > get broader selectivity for free on other objects -- this increases
>>> > accessibility options at no extra cost. (Note that whether you view the
>>> > controlling syntax in declarative or scripted terms is an orthogonal
>>> > concern.)
>>
>> Now, I think you are under-estimating the complexity. Marking this up
>> is the easy bit. But implementing it is the hard bit.
>
> You know, people have been building these things for 15 years in desktop
> players, in telephones, in set-top-boxes, in handheld media players. It
> isn't rocket science -- but it does require looking a bit further than the
> easiest possible path. (Have you ever used HTML+Time in IE, or
> integrated Petri's timesheets, or looked at the dozen or so JavaScript SMIL
> timing implementations? These provide examples of syntax and
> implementations.)
>
> There is a misperception that supporting time containers as top-level
> objects makes life harder. This is not true: localizing timing behavior
> to time containers actually makes you life easier! It separates content
> from control, and it provides a systematic model that works for all
> sorts of media. Combining timing and content containers -- although
> seemingly easier -- simply means that there is no growth path.


I have seen these things at work - many multimedia compositions
created in Adobe Flash also fall in this category. So, I can indeed
see a need for this functionality. But again, the existing media
elements are not the place to introduce this functionality.


>> Honestly, keeping the composing of several media elements separate
>> from dealing with basically a single media resource as we are right
>> now is the right way to go. It follows the divided an conquer
>> principle: once this is solved and solid, it will be easier to develop
>> an approach to solving your requirements.
>
> If you select special-purpose 'solutions' that don't scale, you are not
> using divide&conquer -- you are forcing the next generation of
> developers to invent work-arounds because you have several inconsistent
> timing models in a document. Without a consistent model, you are
> creating throw-away solutions. Now THAT's a waste of effort, since you
> have to implement the temporal semantics anyway -- the problems don't go
> away.

The timing model is very consistent - each media object has a single
timeline and that is the timeline of the main resource of that
element. All media objects on a page are in principle independent and
that is intended. The problem you are painting is one that occurs for
a use case that is not solved and not intended to be solved by the
media elements. It can be hacked in with JavaScript, but - and here I
agree with you - that is not an elegant and possibly a very imprecise
solution to your use case. This is why I keep suggesting that there is
a need to introduce a "media canvas" type approach.

Just like many new functionalities in HTML5 have come from
demonstrating that a JavaScript based approach is clunky, imperfect
and at best a poor approximation to solving a problem, I suggest you
do the same. Provide the description of a use case for the dynamic
media composition functionality that you are after, provide an
implementation using HTML5 existing elements and JavaScript,
demonstrate how imperfect and clunky it is, and make a suggestion of
how to fix it. This is what everyone else who wants to introduce a new
feature has to do and it is also what I would suggest you to do. It
could even result in another EU project proposal ;-).


Best Regards,
Silvia.
Received on Tuesday, 16 March 2010 10:44:59 UTC