W3C home > Mailing lists > Public > public-html-media@w3.org > September 2012

RE: [MSE] Updated MPEG-2 TS Byte Stream Requirements

From: Alex Giladi <alex.giladi@huawei.com>
Date: Thu, 27 Sep 2012 07:12:34 +0000
To: Bob Lund <B.Lund@CableLabs.com>, Aaron Colwell <acolwell@google.com>, "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <76BB22813E02B04A882BEB9FFFF6BBA615EFBFF4@SZXEML507-MBS.china.huawei.com>
Aaron, Bob, all,
See inline.
Thanks!
Alex.

From: Bob Lund [mailto:B.Lund@CableLabs.com]
Sent: Tuesday, September 25, 2012 6:47 PM
To: Aaron Colwell; public-html-media@w3.org; Alex Giladi
Subject: Re: [MSE] Updated MPEG-2 TS Byte Stream Requirements

Hi Aaron,

I see that I did only answer your response on #1 and missed #2 - #8; here are the rest of my responses.

Alex, can you add your comments on #3 - #6?

Thanks,
Bob

From: Bob Lund <b.lund@cablelabs.com<mailto:b.lund@cablelabs.com>>
Date: Monday, September 24, 2012 2:13 PM
To: Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>>
Cc: "public-html-media@w3.org<mailto:public-html-media@w3.org>" <public-html-media@w3.org<mailto:public-html-media@w3.org>>, Alex Giladi <alex.giladi@huawei.com<mailto:alex.giladi@huawei.com>>
Subject: Re: [MSE] Updated MPEG-2 TS Byte Stream Requirements

Hi Aaron,

WRT to specs, I feel your pain. See comment inline.

Bob

From: Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>>
Date: Monday, September 24, 2012 2:03 PM
To: Bob Lund <b.lund@cablelabs.com<mailto:b.lund@cablelabs.com>>
Cc: "public-html-media@w3.org<mailto:public-html-media@w3.org>" <public-html-media@w3.org<mailto:public-html-media@w3.org>>, Alex Giladi <alex.giladi@huawei.com<mailto:alex.giladi@huawei.com>>
Subject: Re: [MSE] Updated MPEG-2 TS Byte Stream Requirements

Hi Bob,

Thanks for the response. I've spent some quality(?) time with the MPEG2-TS & DASH specs over the last couple of days so hopefully my comments will be a little more informed this time around. :)

Comments inline...
On Mon, Sep 24, 2012 at 7:44 AM, Bob Lund <B.Lund@cablelabs.com<mailto:B.Lund@cablelabs.com>> wrote:
Hi Aaron,

See in-line for responses to your questions from Alex and me.

Thanks,
Bob

From: Aaron Colwell <acolwell@google.com<mailto:acolwell@google.com>>
Date: Wednesday, September 19, 2012 7:33 PM
To: Bob Lund <b.lund@cablelabs.com<mailto:b.lund@cablelabs.com>>
Cc: "public-html-media@w3.org<mailto:public-html-media@w3.org>" <public-html-media@w3.org<mailto:public-html-media@w3.org>>
Subject: Re: [MSE] Updated MPEG-2 TS Byte Stream Requirements

Hi Bob,

Thanks for doing this. Sorry it has taken me a little while to respond. I've got a few questions and hopefully they won't seem too silly since I have no experience with MPEG-2 TS. I'll blame the MPEG-2 TS Wikipedia articles if I get something totally wrong. ;)

Many of these questions come from looking at the differences between the old proposal and your new version. My understanding of the old version mapped easily to my understanding of HLS. The new text signals to me that you are going for something more complex.

Here are the questions that came to mind:

1. Why is PSI included in both media segments and initialization segments? All these tables look like initialization segment only material.

PSI is required by MPEG-2 Systems to appear very frequently inband. So, every media segment longer than ~100ms will contain PSI.

[acolwell] So the problem here is that it would look like an initialization segment appeared in the middle of a media segment, because the PSI might appear in the middle of a GOP?

I think Alex has deeper understanding here than I, but I view the PSI as information that will be in an initialization segment but will also be in media segments. So, maybe in MPEG-2 TS there is no need for initialization segments, only self-initializng media segments?
[agiladi] MPEG-2 TS segments are self-initializing under slightly stricter conditions (all PSI is before the first media packets). It seems to be a common practice to have it this way, though people may still produce segments that don't comply with this both in HLS and DASH. Initialization segment will come handy if we want to put conditional access information - e.g. the EMM in it (same concept as putting `pssh` into the IS)



2. Why was the 'transport_error_indicator set to 0' requirement removed?
 a. Why do we want to pass data we know is corrupt to the UA?
 b. Why can't the service the web application is talking to simply strip these bad packets?

The flag set means that the stream is broken. This should be taken care of at the fragmentor, before ever being segmented.


[acolwell] If the flag being set means that the stream is broken then we should restore the requirement that this always be set to 0. This makes it clear that the UA should signal an error if it encounters a TS packet with this bit set.

OK
[agiladi] +1


3. How is the "self-initializing" media segment different from an initialization segment before a standard media segment? I don't really understand this distinction.

 It's same, but "self-initializing" implies having PSI prior to media packets in the media segment.

[acolwell] I think in the Media Source context this doesn't need to be explicitly stated like it is in the DASH spec. In Media Source, the initialization segment must come before any media segments so these two scenarios don't look any different from the UA's point of view. The web application doesn't have to append full segments at a time so it is possible to append a self-initializing media segment in a way that looks like an init segment followed by a normal media segment. I don't think the text needs to treat these two situations separately.

I guess that the web app could examine the media segments and extract the PSI to present to MSE as an initialization segment. However, I don't know what MPEG-2 TS decoders will do if they don't see PSI at a certain rate in the media segments. If this PSI must be left in then I'm not sure what the value of the init segment is; why not just allow self-initializing media segments?
[agiladi] The normal behavior is:

(a)    drop everything until you see a complete PAT;

(b)   parse PAT, drop everything until you see PMT;
So, the decoder is expected to drop all media packets prior to the end of PAT.
A smarter decoder may do some amount of buffering and sort things out later, but I wouldn't count on UA's smartness.
I'd say, self-initializing segments are recommended, if there is a specific use case for separate IS - use it.



4. How is the boundary between media segments detected? Does the Random Access indicator in the Adaptation Field mark the beginning of a new media segment?

No. Segment boundary is unmarked; the random access indicator indicates the beginning of an AU that has SPS and PPS in front of it in the AVC case. So in our case the first packet carrying media data will have this bit on, but the opposite is not necessarily true.

[acolwell] So this could be a problem if there is no straight-forward way to identify where the beginning and end of media segments are. I think we need to come up with detailed rules for how to detect the beginning of a media segment as defined in the Media Source spec. The DASH definitions have the benefit of using files & byte ranges to delineate where segment boundries are. These signals aren't present in Media Source since segments can be appended in pieces.

I think Alex needs to address this and item 6.
[agiladi] Can you clarify the use case here?
In general, there is some work done for explicitly demarcating segment boundaries, should be public in November-December.



5. What restrictions are put on the continuity counter & discontinuity indicator fields? It seems like these could confuse the UA if the web application tried to do "out of order" media segment appending. It seems to me that there should be no jumps in the counter or discontinuities inside a single media segment.

We put none. We can explicitly disallow random jumps, but explicitly signaled discontinuities can stay - they are perfectly legal, and will arise if two parts of the segment were "glued" together from different sources.

[acolwell] So here is another case where I think being able to clearly identify where media segment boundaries are. I could maybe see this working out ok if the discontinuity happens within a single media segment, because we have a larger unit that we can use to resolve the relative jump. If we don't have a good definition for the boundaries then I think the discontinuity could get confused with an out-of-order media segment and the data would likely get placed in an unexpected part of the SourceBuffer. I think there is a disconnect here between the Media Source append model and MPEG2-TS's expected playout model. In Media Source discontinuities in the timeline indicate out of order segment appending. In MPEG2-TS discontinuities in the timeline are allowed, but don't imply out of order media. The expectation is that playback will continue forward, just with new PTS timestamp offsets applied to the new TS packets. This needs to be resolved so that the situation isn't ambiguous to the UA.  Timestamp rollover is a similar situation.

6. Can you provide a little more background about why "The media segment will not rely on initialization information in another media segment." was added? I don't really understand. This implies to me that the bytestream would have to constantly alternate between initialization segments and media segments.

This implies that PSI appearing in the media segment describes the content of this segment entirely and completely. You don't need to alternate, as the frequency of PSI repetition within the segment is fairly high.

[acolwell] Sorry. I don't think I was clear. I meant that the repeated PSI could simply be interpreted by the UA as the initialization segment being repeately interleaved with the media segments. The one situation where this wouldn't make sense is when the PSI appears in the middle of a GOP. In that case this data couldn't be considered an initialization segment, because that would mean that the media data that followed would have to be the start of a media segment which isn't possible because it wouldn't start with a random access point. We just need to establish rules that allow the UA to be able to detect when the PSI is intended to be an init segment and when it is data that appears within a media segment.

[agiladi] Two cases:
1.) IV and/or key change may happen in the middle of the segment. In this case we want all information that is needed for decryption be present either in the segment and prior to the first byte of encrypted media.
2.) PSI can change with time. In this sentence we say "either don't allow PSI changes, or have self-initializing segments"

This relates to #3 I think. The PSI is in elementary streams which are interleaved with media packets in elementary streams. So the PSI doesn't appear in the middle of a GOP in a spatial sense but it can in an temporal sense. As stated in #3, the web app could create an init segment but if the decoder requires the PSI in the media segments then self-initializing media segments might be cleaner.




7. What do expect a deployment that uses MPEG-2 TS as you describe w/ MSE to look like? I'm trying to get a sense of how the web application will fetch the media and what type of infrastructure you expect it to be talking to.

One use case is MPEG DASH distribution using one of the two DASH MPEG-2 TS profiles. Another use case is HLS.

[acolwell] So in all these scenarios, the TS will be split into pieces and hosted on web servers?

Yes
[agiladi] Can we use byte ranges? If not, there is still an option for the web server to translate a segment request into a byte range.


8. What are you trying to provide with MPEG-2 TS that can't be accomplished with the ISO BMFF?

A considerable amount of video content in use today uses MPEG-2 TS.

[acolwell] Are we talking about web accessable content here? I know it is popular in broadcast equipment, but I haven't seen much content on the web aside from the occasional HLS content here and there. Can I get more examples please? I'm just trying to understand what we are talking about here.

Near term, service providers will be transmuxing existing broadcast content into adaptive delivery formats, such as DASH, in a home gateway  for distribution over the home network to browser based video receivers. Even in the case of direct cloud web server to browser based video receivers there may be commercial/licensing requirements to not transmux existing content.


  a. What are the tradeoffs?

Only supporting ISO BMFF means transmuxing this MPEG-2 TS content.

[acolwell] Why is that problematic? Isn't the TS content already being transformed in some way to make it conform to these requirements?

The TS content is being segmented but not transmuxed. Commercial video has tracks such as closed captions, secondary audio, audio descriptions for the visually impaired and  interactive TV signaling, that need to be delivered for regulatory and or commercial reasons. In some cases, there may not even be a definition for representation of these tracks in ISO BMFF.

It seems like at a minimum you'd want to remove all the redundant PSI since the content is being transported over a lossless channel.

I don't think this is how existing MPEG-2 TS segments work. This also gets back to the question about whether existing MPEG-2 TS decoders require the PSI at the rate at which it is sent.
[agiladi] why do we need to transform compatible TS into incompatible TS? I'd rather adhere to DASH Main (which is the least restrictive of them all). Currently, the transformation needed consists of segmenting existing TS and taking care of audio/video overlap (it takes a couple of hours to code this given good libraries and a decent programmer)


  b. Why is it beneficial for UAs to take on the burden of supporting another format?

Some UAs support MPEG-2 TS. Some UAs will be required to support MPEG-2 TS. Defining the MPEG-2 TS byte stream requirements doesn't require a UA to support that format, does it?

[acolwell] Ok. I'm trying to understand whether these UAs will be typical browsers or special cable operator specific applications. If it is the later it might be better to have this specified in a cable spec. I'm trying to understand what the minimal set of MPEG-2 TS that makes sense in a web context. If there are further requirements needed by operators, that is fine, but it isn't clear to me that the place for that is in a web spec.

Cable is trying hard to use W3C specs and not to have "cable specs". I think these arguments apply on the Web when existing commercial content in MPEG-2 TS is to be delivered to standard Web browsers.


You are correct that defining it doesn't mean UA's have to support it since the plan was to put this in a non-normative section like ISO BMFF & WebM are.


I'd also like to request that the Encrypted Media Segments section be removed and the discussion of that part be moved to the EME spec work. I understand it is important, but I'd like to keep encryption topics outside of MSE. Your definitions for initialization segments appear to be broad enough to include the necessary information just like the WebM & ISO BMFF sections are.

OK.

[acolwell] Thank you.

I appreciate your patience with all my questions. I'm really just trying to understand the scope and complexity of what is being proposed and why it is important to people.

Aaron
Received on Thursday, 27 September 2012 07:14:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 27 September 2012 07:14:34 GMT