RE: [media] issue-152: documents for further discussion from Bob Lund on 2011-05-20 (public-html-a11y@w3.org from May 2011)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Fri, 20 May 2011 09:30:19 -0600
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: HTML Accessibility Task Force <public-html-a11y@w3.org>, "Mark Vickers @ Comcast" <mark_vickers@cable.comcast.com>, Eric Winkelman <E.Winkelman@CableLabs.com>, David Agranoff <d.agranoff@CableLabs.com>
Message-ID: <114DAD31379DFA438C0A2E39B3B8AF5D0185B204AA@srvxchg>
> -----Original Message-----
> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
> Sent: Thursday, May 19, 2011 10:12 PM
> To: Bob Lund
> Cc: HTML Accessibility Task Force; Mark Vickers @ Comcast; Eric
> Winkelman; David Agranoff
> Subject: Re: [media] issue-152: documents for further discussion
> 
> On Thu, May 19, 2011 at 1:44 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
> >> Sent: Tuesday, May 17, 2011 7:36 PM
> >> To: Bob Lund
> >> Cc: HTML Accessibility Task Force; Mark Vickers @ Comcast; Eric
> >> Winkelman; David Agranoff
> >> Subject: Re: [media] issue-152: documents for further discussion
> >>
> >> On Wed, May 18, 2011 at 12:44 AM, Bob Lund <B.Lund@cablelabs.com>
> wrote:
> >> > Hi Silvia,
> >> >
> >> > I had considered @data- attributes but was unsure of the
> >> > implications
> >> of this statement in section 3.2.3.8 of the current HTML5 spec
> >> (http://dev.w3.org/html5/spec/Overview.html#embedding-custom-non-
> >> visible-data-with-the-data-attributes):
> >> >
> >> > "User agents must not derive any implementation behavior from these
> >> attributes or values. Specifications intended for user agents must
> >> not define these attributes to have any meaningful values."
> >> >
> >> > In the case of in-band tracks, the user agent will have to create
> >> > the
> >> DOM equivalent of the @data- attribute for metadata tracks. This
> >> appeared to me as being in conflict with the second sentence of the
> >> above quote. Is this not the case?
> >>
> >>
> >> Where would a UA get the information about the special track type
> >> from from in-band metadata tracks?
> >
> > MPEG-2 transport streams contain program map tables that identify each
> > program id with a type, e.g. video, audio,
> 
> video & audio don't mean anything really.
> 
> >EISS (http://www.cablelabs.com/specifications/OC-SP-ETV-AM1.0-I06-
> 110128.pdf)m etc.
> 
> I've tried to understand what EISS is. It seems to be short for
> "Enhanced TV integrated signaling stream" and used for synchronizing an
> application to a video program. The model behind Enhanced TV (ETV) is to
> embed various types of data into the video stream, including programs,
> images and triggers.
> 
> I can tell you now that this is not how the Web works. Video is regarded
> as a part of Web applications and not as the container for such. While I
> can see the reasoning behind putting everything into a video container
> and delivering it in this way to a TV, the Web works generally around
> HTML pages and links to resources inside this HTML page that are
> delivered in parallel and independent of the HTML page but presented
> together with it in the Web browser. I do not see that changing any time
> soon.

Agreed. The intent is to ONLY reuse EISS (the signals to invoke the application). The delivery of the ETV application in-band is not used. URLs in the EISS messages will reference Web content that form the application, so, indeed, HTML pages and links are used just as with existing Web content.

In fact, this is one of the attractions to moving to a Web model - existing Web content can be used as part of ETV applications. Whether TV or Web, there is a need to synchronize when the Web content forming the application is invoked with respect to the underlying media.

> 
> In fact, my approach to a ETV signal stream would be to extract all the
> containing information and create separate Web-conformant packages from
> it, e.g. a common video (MPE4/WebM/Ogg format typically with one audio
> and one video track), separate image resources, separate Web pages, and
> separate caption, advertising etc tracks. Then we can go back to the
> known way of delivering content on the Web.

That is the model, see above. But there is still the need for a signal so the Web content can be synchronized with the media stream.

> 
>
> >MPEG-2 TS may be used directly over HTTP or as the fragment format in
> HTTP live streaming and DASH.
> >
> >> Do you know fields in MP4, Ogg and WebM that provide such
> >> information?
> >
> > Fragmented MP4 will be carried in some adaptive bit-rate containers,
> e.g. certain DASH profiles. In this case, the metadata tracks will be
> identified in the manifest file. However, with respect to the HTML5
> "timed text track" API these are still in-band, i.e. not sourced from an
> external file. In this case, there is still the need to identify the
> type of metadata. Discussions are taking place now in MPEG and other
> places regarding requirements for identifying metadata tracks in DASH.
> 
> 
> I agree the HTTP adaptive streaming may create a use case where we have
> to deal with a complex resource of potentially many tracks.
> However, we haven't even decided on how to solve HTTP adaptive streaming
> in the browser yet. 

While there may be aspects of HTTP adaptive streaming that make sense to expose in HTML, this does not need to be done to solve HTTP adaptive streaming in a browser. We've added browser support for HTTP Live Streaming by adding a module to the media pipeline that understands the HTLS manifest file. The <video> src points to the manifest file URL and there is no impact to HTML.

> Many discussions are going on about this right now
> in several forums. Right now, I cannot see which of the existing
> solutions would become adopted by the browsers or whether it may even be
> a new one. My gut feeling is that the functionality may be a subpart of
> DASH, even though DASH itself may not be able to be picked for IP
> reasons. So, until we know what will happen there, let's not create a
> solution for something that hasn't been decided on yet.

No matter how adaptive streaming is ultimately exposed in HTML, the issues I'm raising with how metadata gets exposed will need to solved. The solution I'm proposing will work with MPEG-2 TS, DASH and Microsoft Smooth Streaming. It would also work with HLS if text tracks were supported in the manifest file.
> 
> As for the inclusion of metadata tracks in HTTP adaptive streaming
> manifests: right now my thinking is that it makes no sense to include
> text tracks into a manifest, because the manifest has the sole purpose
> to switch between tracks of differing bitrate for video (and maybe for
> audio). Since text tracks are typically of a much smaller bandwidth that
> audio or video and contain very concise information that cannot be "bit
> peeled", they should not take part in the adaptive manifest.
> Instead, they should be delivered through the <track> element.

Support for text tracks are specified in the DASH spec. See http://lists.w3.org/Archives/Public/public-html-a11y/2011Feb/0151.html example 1 in appendix G.
> 
> But let's wait for a specification and for some trial implementations on
> this. I think we are trying to solve a problem that doesn't even exist
> yet.
> 
> 
> >> If there is such a field that you need exposed on top of what is
> >> already there, then it would indeed make sense to include that.
> >
> > As described above, there is or will be such a field.
> >
> >> But I honestly doubt that you will find in-band tracks that will tell
> >> you that they contain adinsertion information or syncwebcontent data.
> >
> > See above.
> 
> 
> I'm still skeptical about application-specific metadata tracks inside a
> MP4/WebM/Ogg file. At this stage I'd say: show me an example file that
> has this and some software that exists now and that extracts it and a
> use case where these need to be on the Web.

With regards to examples:

- The link above shows text tracks in DASH
- CableLabs, on behalf of its cable company members, has submitted comments to the DASH DIS to add metadata support.

- To quote David Singer in his previous email on this thread:

" I may be getting into the middle of something I don't understand, but...

* we can easily add a metadata tag to MP4 tracks to declare their kind/role
* MP4 has metadata tracks, that could be used to carry any format (it just needs defining)
* we can use caption track events with a caption track that has no visible text (we may need a tag for the 'cookie')"

- My previous email has identified 3 types of application metadata that is carried in MPEG-2 TS today.

With regards to use cases, North American cable operators have identified the requirement that existing application metadata applications continue to work with browser based clients.

Perhaps this should be a topic for one of the weekly media group calls.

Bob

> 
> 
> >> This is all very application-specific
> >
> > You are right these are application specific but in the broadcast
> > industry these applications are common: ETV
> > (http://www.cablelabs.com/advancedadvertising/etv/), ad insertion
> > (http://www.scte.org/documents/pdf/standards/ANSI_SCTE%2035%202007%20D
> > igital%20Program%20Insertion%20Cueing%20Message%20for%20Cable.pdf) and
> > parental control content advisories
> > (http://www.ce.org/Standards/browseByCommittee_2524.asp)
> >
> >> and therefore can only be solved with external text tracks IMHO.
> >
> > Out-of-band timed text tracks work well for file based content but I
> don't think will work for linear streams with no start or end.
> 
> It works to deliver timed tracks live. There are plenty of text
> streaming services available these days, such as
> http://www.realtimetranscription.com/, http://streamtext.net/. Any
> application data can be streamed in a similar way. I can be done with
> with <text> tracks or with MutableTextTrack.
> 
> 
> Overall, I think it's just too early to make a call on this. After all,
> it all has to be implemented into browsers. If the existing methods turn
> out not to work with a large set of content from a large variety of
> sources that will typically hit the Web, then it makes time to implement
> a standard means of dealing with such content. I can't see that being
> the case right now.
> 
> 
> Cheers,
> Silvia.
Received on Friday, 20 May 2011 15:30:48 UTC