Re: [MSE] transport stream constraints vs Apple's HLS from Gary Hughes on 2013-02-05 (public-html-media@w3.org from February 2013)

From: Gary Hughes <ghughes@motorola.com>
Date: Tue, 5 Feb 2013 11:49:01 -0500
To: Michael Thornburgh <mthornbu@adobe.com>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <CANhEN5AT6DFL=1v76L0Cv+5SV-9mR4r86+3kidcjuZv3ERmZ2A@mail.gmail.com>
Michael, all

On Thu, Jan 31, 2013 at 6:54 PM, Michael Thornburgh <mthornbu@adobe.com>wrote:

> as of the current (31 Jan 2013) MSE editor's draft, there are a number of
> constraints on MPEG-2 transport streams that will make supporting Apple's
> HTTP Live Streaming (HLS) difficult.
>
> given the popularity of the iOS platform and HLS being the only
> allowed/supported streaming format there today and for the foreseeable
> future, there is a strong incentive for content publishers to use HLS.
>  once they're using HLS, it can be inconvenient (at best) or impractical
> and costly to support additional formats.  this reasoning has led folks to
> provide native HLS support in other platforms as well (such as Android),
> and i personally (and others independently) have written HLS clients in
> Flash.
>
> i think it would be beneficial to the Open Web if MSE could support this
> popular format with a minimum of impedance mis-match.
>
> (31 Jan 2013 MSE editor's draft) section 11's constraints on media
> segments and MPEG-2 TS segments specifically are problematic:
>
> * section 11 top: "Segments must start with a random access point to
> facilitate seamless splicing at the segment boundary" -- HLS does not have
> this constraint, and existing HLS content and encoders/packagers create
> segments that may not start with a RAP. this usually happens when
> encoders/packagers are configured to make each segment exactly the same
> duration, independent of where key frames fall.  i think this constraint
> should be relaxed such that each segment must contain at least one RAP, but
> not necessarily start with one.  the splice (or initial playback) could
> commence at the first RAP, which will often be the beginning of the segment
> but shouldn't have to be..\
>

[GH] We faced similar concerns in the MPEG DASH ad hoc  group which lead to
the two defined Profiles for TS based media segments. TS Main places no
constraints on access point alignment and is intended to be compatible with
existing  HLS content. TS Simple requires that media segments be well
behaved and start with access points. There is a benefit to a client
implementation in knowing if content conforms to this constraint.


> * section 11.3.3 constraint #4: "Each PES packet must be comprised of one
> or more complete access units" -- this is unnecessarily restrictive and
> will exclude a lot of existing HLS content (and existing content encoders,
> including transport stream encoders supplied by Apple and others).  one of
> the tricks used in some encoders to reduce transport stream overhead is to
> allow the end of an access unit to spill into the beginning of the next PES
> packet, to avoid having to pad a last transport stream packet in the PES
> packet with wasted bytes.  reading this constraint in the most permissive
> way possible (which is probably not its intended meaning) would still not
> allow PES packets containing the last few bytes of the previous access unit
> and most (but the last few) bytes of a new access unit, since there would
> never be one complete access unit in any PES packet.  i think this
> constraint should be removed.  access unit semantics as defined in ISO/IEC
> 13818-1 should be sufficient, and any conformant transport stream parser
> must already support this PES/access unit overlap.
>

[GH] While 13818-1 does not require any specific PES alignment other than
at signaled discontinuities or random access points, many derivative
specifications, such as those from SCTE, DVB, DLNA do require that AVC AUs
be aligned with PES packets. Transport stream encoders that support cable
or IPTV environments will have an option to produce PES aligned streams,
one AU per PES at least for video.

It is more problematic for audio as the additional overhead can be much
greater. I have seen the bitrate demand nearly double when an encoder packs
one audio AU per PES packet. Encoders/multiplexers will typically pack
multiple audio AUs per PES to reduce this impact.

Content that has this type of PES alignment is still compatible with HLS,
although I have certainly seen evil bitstreams that claim to be HLS
compatible that do not meet any of these constraints (and which often do
not play well with non-iOS clients).

Systems or devices that manipulate transport streams in the compressed
domain (servers, splicers, packagers, etc.) will often operate on PES
packet boundaries, relying upon the PES header timing (PTS/DTS) to maintain
pacing and audio/video synch. When handling 100s or 1000s of streams it is
a significant advantage to not have to open up the PES packet and parse the
contents. It also makes it possible to operate on sample-encrypted content
without having to decrypt/re-encrypt.

MPEG DASH also requires that each PES packet be comprised of one or more
complete access units. The nominal DASH client model has the DASH access
engine outputting a conformant stream to the media engine and the thinking
was the we should not require the DASH access engine to deconstruct the
media segments in order to create its output stream. This is not the only
way to build DASH clients of course, but it does seem (to me) to be similar
to the model being used in constructing javascript clients using the MSEs.


> * section 11.3.5: "Timestamp Rollover & Discontinuities" -- in HLS,
> discontinuities are indicated in the index file, not in the transport
> stream files..  there should be a way to indicate a discontinuity via the
> API that would be interpreted identically to an in-stream discontinuity
> indication. along with this, indicating a discontinuity coincident with
> abort() must change the behavior of MPEG2TS_timestampOffset at the abort()
> (to make a contiguous splice rather than resetting MPEG2TS_timestampOffset
> to 0).
>
> [GH] Signaled discontinuities may occur anywhere within a TS media segment
so it would be useful to be able to indicate that to an application.

* also in section 11.3.5: in HLS, there's no provision for (or need to)
> indicate the interior media time stamps.  the same "discontinuity" API
> indication for the above could allow this to work, by having the next
> appended media snap to the expected position (like "the beginning" or
> "contiguous with previous media" or "aligned to timestampOffset").  a
> "discontinuity" API would also allow other formats to not have to know
> about their interior media timestamps, which could simplify index file
> creation in the general case.
>
> [GH] I agree, it would be useful for the application to have visibility
into current media timestamps. For TS media segments I assume this means
PTS/DTS? It may facilitate the migration of some existing applications into
the DASH environment.

regards,

gary

Gary Hughes
Advanced Technology Group
Motorola Mobility, MA35
900 Chelmsford Street
Lowell, MA 01851
Email: ghughes@motorola.com
Office: 978 614 3504
Mobile: 978 339 3615
Received on Tuesday, 5 February 2013 18:54:31 UTC