RE: [media] issue-152: documents for further discussion from Bob Lund on 2011-04-20 (public-html-a11y@w3.org from April 2011)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Wed, 20 Apr 2011 08:51:49 -0600
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Mark Watson <watsonm@netflix.com>
CC: Ian Hickson <ian@hixie.ch>, Philip Jägenstedt <philipj@opera.com>, "public-html-a11y@w3.org" <public-html-a11y@w3.org>
Message-ID: <114DAD31379DFA438C0A2E39B3B8AF5D01841D3117@srvxchg>
> -----Original Message-----
> From: public-html-a11y-request@w3.org [mailto:public-html-a11y-
> request@w3.org] On Behalf Of Silvia Pfeiffer
> Sent: Tuesday, April 19, 2011 10:39 PM
> To: Mark Watson
> Cc: Ian Hickson; Philip Jägenstedt; public-html-a11y@w3.org
> Subject: Re: [media] issue-152: documents for further discussion
>
> Mark,
>
> what is your list of track kinds for in-band tracks?
>
> I have thus far come up with the following:
>
> video:
> * sign language video (in different sign languages)
> * captions (as in: burnt-in video that may just be overlays)
> * different camera angle

* associated video track (which might be a generalization of different camera angle). One use case is video mosaic.

>
> audio:
> * audio descriptions
> * language dub
>
> Cheers,
> Silvia.
>
>
> On Wed, Apr 20, 2011 at 2:26 PM, Mark Watson <watsonm@netflix.com>
> wrote:
> > I'd like to second the requirement for an enumerated 'kind' for in-
> band tracks. Considering adaptive streaming approaches, the information
> about track kinds is much more likely to be available in-band than from
> external metadata systems.
> >
> > Whilst the external metadata might contain information about what is
> available, for presentation to the user (e.g. List of languages) it
> doesn't make sense to require that layer to include information about
> how this is mapped to any particular container or manifest format, when
> multiple such versions of the content might be available (and might be
> added or removed without changes to the user-visible metadata).
> >
> > A clean separation between UI and transport implies there is more than
> > just natural-languages at the media transport layer: in fact it would
> make more sense to have *only* the enumerated kind at the media
> transport layer and leave the natural language aspects to the
> presentation layer.
> >
> > Furthermore, enumerated kinds are necessary to make initial choices
> based on user preferences - you have to be able to understand *what* the
> tracks are. Also presentation of such tracks to the user might not be a
> single menu of choices: there is structure such as language choices for
> main audio and audio descriptions which mirror the language choices for
> the corresponding subtitle tracks and this structure needs to be exposed
> for a sensible UI that properly considers accessibility.
> >
> > ...Mark
> >
> > Sent from my iPhone
> >
> > On Apr 19, 2011, at 7:22 PM, "Silvia Pfeiffer"
> <silviapfeiffer1@gmail.com> wrote:
> >
> >> On Wed, Apr 20, 2011 at 10:51 AM, Ian Hickson <ian@hixie.ch> wrote:
> >>> On Tue, 12 Apr 2011, Silvia Pfeiffer wrote:
> >>>>>> the TrackList only includes name and language attributes - in
> >>>>>> analogy to TextTrack it should probably rather include (name,
> >>>>>> label, language, kind)
> >>>>>
> >>>>> I'm fine with exposing more data, but I don't know what data
> >>>>> in-band tracks typically have. What do in-band tracks in popular
> >>>>> video formats expose? Is there any documentation on this?
> >>>>
> >>>> There is a discussion on the main list about metadata right now and
> >>>> I have posted a link there about what the W3C Media Annotations WG
> >>>> 's analysis of media formats found as typically used metadata on
> >>>> audio and video. If you want to understand what is generally
> >>>> available, that is a good starting point, see
> http://www.w3.org/TR/mediaont-10/ .
> >>>
> >>> Woah, that's a lot of data. I guess a better approach for this will
> >>> be to look at use cases and figure out what needs exposing.
> >>
> >> Yeah, I agree. And by no means am I suggesting to adopt all of them,
> >> or even to adopt the complex structure that the WG came up with. I
> >> look at it as an interesting analysis in what is available.
> >>
> >>
> >>>> I would, however, regard these two attributes that we discussed
> >>>> here as a separate issue, because if somebody wants to create
> >>>> custom controls and e.g. provide all the alternative video
> >>>> descriptions in one menu, they would want all the text descriptions
> >>>> and audio descriptions listed
> >>>> - similarly if they want all the alternative captions in one menu,
> >>>> they would want all the text track captions as well as all the
> >>>> videos that are created from bitmaps as overlay captions as well as
> >>>> all the alternative video tracks with burnt-in captions. So,
> >>>> providing a label (for use in the menu) and a kind (for
> classification) is very useful.
> >>>> These can all be mapped from fields from within video formats.
> >>>
> >>> I assume you're talking primarily about "kind" here. "name" and
> "label"
> >>> are the same thing (actually I've renamed "name" to "label" to
> >>> improve consistency with other parts of the platform).
> >>
> >> Yes, I don't mind if we call it "name" or "label" - I do prefer label
> >> to be consistent with the Text Track.
> >>
> >> I do think we need an additional "id" or similar, which is unique and
> >> can be used for fragment addressing. (See the other thread).
> >>
> >>
> >>> Looking at the metadata list cited above, I don't see anything in
> >>> either ogg, mp4, or webm that maps to "kind", so I don't see much
> >>> point exposing that on the audio/video track lists, though I agree
> >>> that in principle it would be a good idea.
> >>
> >> Let's be clear: the media annotations list is not complete for each
> >> one of the formats. It is trying to identify a subset that will work
> >> across many formats.
> >>
> >> Also, they actually have a "role" attribute on the "fragment" which
> >> they suggest using for identifying the "kind" of a "track":
> >> http://www.w3.org/TR/mediaont-10/#example4 . So, it is indeed there.
> >> Hmm.. given this, I should probably change what is written for "OGG"
> >> under "fragment", because certainly Ogg has fields that provide the
> >> kind of a track. WebM and MP4 have them, too.
> >>
> >>
> >>> Realistically though, for in-band tracks it's more likely that that
> >>> data will be provided to the script out-of-band so that it can
> >>> construct the UI before the movie loads, and for out-of-band tracks
> >>> the information can be made available in the markup (e.g. using
> >>> data-* attributes). For UA-driven menus, the title is probably
> >>> sufficient for most purposes, and that can already be made
> available.
> >>
> >> The biggest issue with this approach is discoverability. A Web
> >> developer that has to deal with multiple resources for which he
> >> doesn't a-priori know what kinds of tracks they have available
> >> in-band has no chance to find this out through script if there is no
> >> interface that exposes this information. It would need to be done
> server-side.
> >>
> >>
> >>>>> Note that for media-element-level data, you can already use data-*
> >>>>> attributes to get anything you want, so the out-of-band case is
> >>>>> already fully handled as far as I can tell.
> >>>>
> >>>> Interesting. In the audio description case, would a label, kind,
> >>>> and language be added to the menu of the related video element?
> >>>
> >>> For scripted UIs, that's up to the script.
> >>>
> >>> For UA UIs, it depends if we are talking about multiple video tracks
> >>> or multiple audio tracks. Multiple video tracks aren't handled,
> >>> because there's no sane way to have the UA turn the video tracks on
> >>> and off. For the audio case, I don't really see much reason to
> >>> expose more than a title. A kind could be used but it's going to be
> >>> used so rarely that in practice the UA will want to handle the case
> >>> of only having a title anyway, and once you support that, it's not
> >>> clear what a kind would really do to make things better.
> >>>
> >>> It's something we can always provide in the future though, if it
> >>> turns out to be more common than one would guess from looking at
> content today.
> >>
> >> I think it will be a problem with the first implementation of this,
> >> since we would want to add the information to the menu for audio
> >> tracks just like for text tracks and the text tracks have this
> >> information (kind, label, language).
> >>
> >> I guess we can wait till then, though, since it doesn't change
> >> anything substantial about the way in which thing work.
> >>
> >>
> >>
> >>>>> | a group should be able to loop over the full multitrack rather
> >>>>> | than a single slave
> >>>>>
> >>>>> Not sure what this means.
> >>>>
> >>>> We discussed the looping behaviour. To make it symmetrical with
> >>>> in-band multitrack resources, it would make sense to be able to
> >>>> loop over composed multitrack resources, too. The expected looping
> >>>> behaviour is that a loop on the composed resource loops over the
> >>>> composite as a whole. So, the question is then how to turn such
> looping on.
> >>>>
> >>>> The proposal is that when one media element in the group has a
> >>>> @loop attribute, that would turn the looping on the composite
> resource on.
> >>>> This means that when the loop is set and the end of the composite
> >>>> resource is reached (its duration), the currentTime would be reset
> >>>> to its beginning and playback of the composite resource would start
> again.
> >>>> Looping on individual elements is turned off and only the composite
> >>>> resource can loop.
> >>>
> >>> What's the use case?
> >>
> >> The same as for the loop attributes on a audio or video element. It's
> >> a media resource and should work consistently to how other media
> >> resources are handled.
> >>
> >> E.g. if I have a plugin that likes to turn all media elements to
> >> looping for whatever reason (entertain the kids? ;-), I can do that
> >> for normal media elements and for in-band multitrack consistently
> >> with the loop attribute, but I have to make an exception for composed
> >> multitrack, because it doesn't allow for the handling of a loop
> >> attribute. (stupid example, I know: so pick something with music..)
> >>
> >>
> >>
> >>>>> | some attributes of HTMLMediaElement are missing in the
> >>>>> | MediaController that might make sense to collect state from the
> >>>>> | slaves: error,
> >>>>>
> >>>>> Errors only occur as part of loading, which is a per-media-element
> >>>>> issue, so I don't really know what it would mean for the
> >>>>> controller to have it.
> >>>>
> >>>> The MediaController is generally regarded as the state keeper for
> >>>> the composite resource.
> >>>
> >>> It is? That's certainly not how it's defined. It's just a central
> >>> controller, it doesn't keep any of the state for the resources.
> >>
> >> Not for the individual ones, but for the combined construct. E.g. you
> >> can ask it for what the currentTime or the combined construct is,
> etc.
> >>
> >>
> >>>> So, what happens when a single slave goes into error state. Does
> >>>> the full composite resource go into error state? Or does it ignore
> >>>> the slave
> >>>> - turn it off, and continue?
> >>>
> >>> Media elements don't really have an error state. They have a
> >>> networkState and a readyState, which affect the MediaController, but
> the 'error'
> >>> attribute is just for exposing the last error for events, it's not
> >>> part of the state machine.
> >>
> >> That still doesn't answer the question: what happens if one of the
> >> slaves happens to have a network error and cannot continue playing,
> >> because it runs out of data. Does the combined resource stall?
> >> Forever? Is there a way for script to identify this and remove the
> >> stalling slave from the group? Maybe we need an onerror event on the
> >> MediaController, which will be raised if one of the slaves has an
> >> error fetching the media data. Then the script developer can go
> >> through the list of slaves in the one callback and remove the
> >> contender.
> >>
> >>
> >>>>> | readyState
> >>>>>
> >>>>> I could expose a readyState that returns the lowest value of all
> >>>>> the readyState values of the slaved media elements, would that be
> useful?
> >>>>> It would be helpful to see a sample script that would make use of
> >>>>> this; I don't really understand why someone would care about doing
> >>>>> this at the controller level rather than the individual track
> level.
> >>>>
> >>>> I think it makes sense, in particular when script is waiting for
> >>>> all elements to go to HAVE_METADATA state, which is often the case
> >>>> when you are trying to do something on the media resource, but have
> >>>> to wait until it's actually available.
> >>>>
> >>>> An example JS would be where you are running your own controls for
> >>>> the combined resource and want to determine the combined duration
> >>>> and volume for visual display, e.g.
> >>>>
> >>>>      video.controller.addEventListener("loadedmetadata", init,
> >>>> false);
> >>>>      function init(evt) {
> >>>>        duration.innerHTML = video.controller.duration.toFixed(2);
> >>>>        vol.innerHTML      = video.controller.volume.toFixed(2);
> >>>>      }
> >>>>
> >>>> So, I think a combined readyState makes sense in the way you
> described.
> >>>
> >>> That example doesn't use readyState at all. Is there a use case for
> >>> readyState specifically?
> >>
> >>
> >> We actually discussed in the last call whether we only need the
> >> events or also readyState. Eric had an example where you would raise
> >> an event, but only do something if the element is only in a
> >> particular readyState at the time of processing. I don't remember
> >> exactly what it was. My position is that we really need the events,
> >> but I could live without having a combined readyState.
> >>
> >>
> >>>>> | (this one is particularly important for onmetadatavailable
> >>>>> | events)
> >>>>>
> >>>>> The events are independent of the attributes. What events would
> >>>>> you want on a MediaController, and why? Again, sample code would
> >>>>> really help clarify the use cases you have in mind.
> >>>>
> >>>> Maybe a onmetadatavailable event is more useful than a readyState
> then?
> >>>
> >>> I've updated the spec to fire a number of events on MediaController,
> >>> including 'metadataavailable' and 'playing'/'waiting'.
> >>
> >> Ah excellent. That's great.
> >>
> >>
> >>>> I am not aware of many scripts that use the readyState values
> >>>> directly for anything, even on the media elements themselves.
> >>>
> >>> One example of readyState usage on a media element would be a
> 'waiting'
> >>> event handler that checks whether readyState is HAVE_CURRENT_DATA or
> >>> HAVE_METADATA and uses that information to decide whether to display
> >>> a poster frame overlay or not.
> >>>
> >>>
> >>>>> | TimeRanges played
> >>>>>
> >>>>> Would this return the union or the intersection of the slaves'?
> >>>>
> >>>> That would probably be the union, because those parts of the
> >>>> timeline are what the user has viewed, so he/she would expect them
> >>>> to be marked in manually created controls.
> >>>
> >>> Ok, added.
> >>>
> >>>
> >>>>> | ended
> >>>>>
> >>>>> Since tracks can vary in length, this doesn't make much sense at
> >>>>> the media controller level. You can tell if you're at the end by
> >>>>> looking at currentTime and duration, but with infinite streams and
> >>>>> no buffering the underlying slaves might keep moving things along
> >>>>> (actively playing) with currentTime and duration both equal to
> >>>>> zero the whole time. So I'm not sure how to really expose 'ended'
> >>>>> on the media controller.
> >>>>
> >>>> "ended" on the individual elements (in the absence of loop) returns
> >>>> true when
> >>>>
> >>>> Either:
> >>>>
> >>>>    The current playback position is the end of the media resource,
> >>>> and
> >>>>    The direction of playback is forwards.
> >>>>
> >>>> Or:
> >>>>
> >>>>    The current playback position is the earliest possible position,
> >>>> and
> >>>>    The direction of playback is backwards.
> >>>
> >>> No, 'ended' only fires when going forwards.
> >>
> >>
> >> I quoted from the spec:
> >> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.htm
> >> l#ended-playback
> >>
> >>
> >>>> So, in analogy, for the composed resource: it would return the
> >>>> union of the ended result on all individual elements, namely
> >>>> "ended" only when all of them are in ended state.
> >>>
> >>> But what's the use case?
> >>
> >> If I reach the end, I want to present something different, such as a
> >> post-roll add or an overlay with links to other videos that are
> >> related. It is much easier to wait on a onended event on the combined
> >> resource than having to register an event handler with each slave and
> >> then try and combine the result.
> >>
> >>
> >>>>> | and autoplay.
> >>>>>
> >>>>> How would this work? Autoplay doesn't really make sense as an IDL
> >>>>> attribute, it's the content attribute that matters. And we already
> >>>>> have that set up to work with media controllers.
> >>>>
> >>>> As with @loop, it would be possible to say that when one media
> >>>> element in the union has @autoplay set, then the combined resource
> >>>> is in autoplay state.
> >>>
> >>> I don't understand the use case for exposing this as an IDL
> >>> attribute on the controller.
> >>
> >> Same use case as for any other media element - and then to have it
> >> consistent, so that we can use the same code to deal with grouped
> >> media elements as with in-band multitrack elements. For @autoplay it
> >> even has an accessibility use case: it's possible for a UA or plugin
> >> to provide settings to stop autoplay on media elements with the
> >> @autoplay IDL attribute. It's not easily possible to stop hand-coded
> >> play() calls, which would be the only way for grouped multitrack
> >> media in the way in which it is currently specified.
> >>
> >>
> >>
> >>>> One more question turned up today: is there any means in which we
> >>>> could possibly create @controls (with track menu and all) for the
> >>>> combined resource? Maybe they could be the same controls on all the
> >>>> elements that have a @controls active, but would actually be driven
> >>>> by the controller's state rather than the element's? Maybe the
> >>>> first video element that has a @controls attribute would get the
> >>>> full controller's state represented in the controls? Could there be
> >>>> any way to make @controls work?
> >>>
> >>> The UA is responsible for this, but the spec requires that the UI
> >>> displayed for a control that has a controller control the
> controller.
> >>
> >> That's good. Question of clarification: Does that mean that for all
> >> elements in a group that display controls these controls actually
> >> control the controller? Or do only the controls of the media element
> >> that created the controller control the controller?
> >> (hope that makes sense to you ;-)
> >>
> >>
> >> Cheers,
> >> Silvia.
> >>
> >>
> >
Received on Wednesday, 20 April 2011 14:52:52 UTC