Re: [media] issue-152: documents for further discussion

On Thu, May 19, 2011 at 1:44 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
>
>
>> -----Original Message-----
>> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
>> Sent: Tuesday, May 17, 2011 7:36 PM
>> To: Bob Lund
>> Cc: HTML Accessibility Task Force; Mark Vickers @ Comcast; Eric
>> Winkelman; David Agranoff
>> Subject: Re: [media] issue-152: documents for further discussion
>>
>> On Wed, May 18, 2011 at 12:44 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
>> > Hi Silvia,
>> >
>> > I had considered @data- attributes but was unsure of the implications
>> of this statement in section 3.2.3.8 of the current HTML5 spec
>> (http://dev.w3.org/html5/spec/Overview.html#embedding-custom-non-
>> visible-data-with-the-data-attributes):
>> >
>> > "User agents must not derive any implementation behavior from these
>> attributes or values. Specifications intended for user agents must not
>> define these attributes to have any meaningful values."
>> >
>> > In the case of in-band tracks, the user agent will have to create the
>> DOM equivalent of the @data- attribute for metadata tracks. This
>> appeared to me as being in conflict with the second sentence of the
>> above quote. Is this not the case?
>>
>>
>> Where would a UA get the information about the special track type from
>> from in-band metadata tracks?
>
> MPEG-2 transport streams contain program map tables that identify each program id with a type, e.g. video, audio,

video & audio don't mean anything really.

>EISS (http://www.cablelabs.com/specifications/OC-SP-ETV-AM1.0-I06-110128.pdf)m etc.

I've tried to understand what EISS is. It seems to be short for
"Enhanced TV integrated signaling stream" and used for synchronizing
an application to a video program. The model behind Enhanced TV (ETV)
is to embed various types of data into the video stream, including
programs, images and triggers.

I can tell you now that this is not how the Web works. Video is
regarded as a part of Web applications and not as the container for
such. While I can see the reasoning behind putting everything into a
video container and delivering it in this way to a TV, the Web works
generally around HTML pages and links to resources inside this HTML
page that are delivered in parallel and independent of the HTML page
but presented together with it in the Web browser. I do not see that
changing any time soon.

In fact, my approach to a ETV signal stream would be to extract all
the containing information and create separate Web-conformant packages
from it, e.g. a common video (MPE4/WebM/Ogg format typically with one
audio and one video track), separate image resources, separate Web
pages, and separate caption, advertising etc tracks. Then we can go
back to the known way of delivering content on the Web.


>MPEG-2 TS may be used directly over HTTP or as the fragment format in HTTP live streaming and DASH.
>
>> Do you know fields in MP4, Ogg and WebM
>> that provide such information?
>
> Fragmented MP4 will be carried in some adaptive bit-rate containers, e.g. certain DASH profiles. In this case, the metadata tracks will be identified in the manifest file. However, with respect to the HTML5 "timed text track" API these are still in-band, i.e. not sourced from an external file. In this case, there is still the need to identify the type of metadata. Discussions are taking place now in MPEG and other places regarding requirements for identifying metadata tracks in DASH.


I agree the HTTP adaptive streaming may create a use case where we
have to deal with a complex resource of potentially many tracks.
However, we haven't even decided on how to solve HTTP adaptive
streaming in the browser yet. Many discussions are going on about this
right now in several forums. Right now, I cannot see which of the
existing solutions would become adopted by the browsers or whether it
may even be a new one. My gut feeling is that the functionality may be
a subpart of DASH, even though DASH itself may not be able to be
picked for IP reasons. So, until we know what will happen there, let's
not create a solution for something that hasn't been decided on yet.

As for the inclusion of metadata tracks in HTTP adaptive streaming
manifests: right now my thinking is that it makes no sense to include
text tracks into a manifest, because the manifest has the sole purpose
to switch between tracks of differing bitrate for video (and maybe for
audio). Since text tracks are typically of a much smaller bandwidth
that audio or video and contain very concise information that cannot
be "bit peeled", they should not take part in the adaptive manifest.
Instead, they should be delivered through the <track> element.

But let's wait for a specification and for some trial implementations
on this. I think we are trying to solve a problem that doesn't even
exist yet.


>> If there is such a field that you need
>> exposed on top of what is already there, then it would indeed make sense
>> to include that.
>
> As described above, there is or will be such a field.
>
>> But I honestly doubt that you will find in-band tracks
>> that will tell you that they contain adinsertion information or
>> syncwebcontent data.
>
> See above.


I'm still skeptical about application-specific metadata tracks inside
a MP4/WebM/Ogg file. At this stage I'd say: show me an example file
that has this and some software that exists now and that extracts it
and a use case where these need to be on the Web.


>> This is all very application-specific
>
> You are right these are application specific but in the broadcast industry these applications are common: ETV (http://www.cablelabs.com/advancedadvertising/etv/), ad insertion (http://www.scte.org/documents/pdf/standards/ANSI_SCTE%2035%202007%20Digital%20Program%20Insertion%20Cueing%20Message%20for%20Cable.pdf) and parental control content advisories (http://www.ce.org/Standards/browseByCommittee_2524.asp)
>
>> and therefore can only be solved with external text tracks IMHO.
>
> Out-of-band timed text tracks work well for file based content but I don't think will work for linear streams with no start or end.

It works to deliver timed tracks live. There are plenty of text
streaming services available these days, such as
http://www.realtimetranscription.com/, http://streamtext.net/. Any
application data can be streamed in a similar way. I can be done with
with <text> tracks or with MutableTextTrack.


Overall, I think it's just too early to make a call on this. After
all, it all has to be implemented into browsers. If the existing
methods turn out not to work with a large set of content from a large
variety of sources that will typically hit the Web, then it makes time
to implement a standard means of dealing with such content. I can't
see that being the case right now.


Cheers,
Silvia.

Received on Friday, 20 May 2011 04:12:42 UTC