Re: [MSE] non fragmented MP4

Comments inline...


On Fri, Feb 15, 2013 at 6:20 AM, Cyril Concolato <
cyril.concolato@telecom-paristech.fr> wrote:

>  Hi Aaron,
>
> Le 14/02/2013 23:35, Aaron Colwell a écrit :
>
> Hi Giuseppe,
>
>  There are no current plans to support non-fragmented MP4 files. One
> thing to remember is that MSE  accepts byte streams and not files per se.
> For MP4 we use the fragmented format because it allows segments of the
> timeline to be appended easily and in any order.
>
> Appending segments in any order may not be so easy. The MP4 spec says in
> the Movie Fragment Header Box definition:
> "The movie fragment header contains a sequence number, as a safety check.
> The sequence number usually starts at 1 and must increase for each movie
> fragment in the file, in the order in which they occur. This allows readers
> to verify integrity of the sequence; it is an error to construct a file
> where the fragments are out of sequence."
>
> So if you implement MSE on top of a conformant MP4 reader, feeding segment
> data as if they were consecutive in a 'virtual' file, this won't work.
> Segments with a sequence number smaller than the one of the first segment
> provided may be rejected. To make sure, the append happens correctly with
> an unmodified MP4 parser, the MSE implementation will have to parse each
> segment, check the sequence number and if needed reinitialize the parser
> before feeding the out-of-order segment.
>

[acolwell] MSE ignores the sequence number. Again it is important to not
think in terms of files, but in terms of a bytestream. MSE accepts an
bytestream that looks very close to a fragmented ISOBMFF file, but it
allows things that aren't compliant with the ISOBMFF spec. If one decides
to use a conformat MP4 reader for an MSE implementation then they will have
to relax parts of the validation checks to avoid rejecting bytestream
constructs that are allowed by MSE. MSE accepts the ISOBMFF fragmented form
because it is a relatively simple way to encapsulate segments of a larger
presentation. The intent was never to support the fragmented file format,
but rather something close enough to that form to make it easy to use
segments of MP4 content to construct a presentation.


>
>
>   Supporting non-fragmented files would require the UA to hold the whole
> file in memory which could be very problematic on memory constrained
> devices.
>
> Is there any requirement in MSE to keep the data in memory and not on disk?
>

[acolwell] That really is beside the point. Sure disk could be used, but in
the mobile or TV case that isn't likely to be an option. Having disk just
delays the problem a little longer. Eventually the UA may need to evict
part of the timeline covered by the file and you'd have to reappend the
whole file again to get that region back.


>
>  If the UA decides to garbage collect part of the presentation timeline
> to free up space for new appends it is not clear how the web application
> could reappend the garbage collected regions without appending the whole
> file again.
>
> You could tell the same thing about non-fragmented files if the fragment
> is very long. Sure, you will more likely find large non-fragmented files
> than fragmented files with large fragments but the problem is the same.
> Small non-fragmented files (such as small clips, ads) should not be
> excluded.
>

[acolwell] I agree, but it is much more likely for people to create large &
long non-fragmented MP4 files than it is for people to create fragmented
files with large fragments. Forcing the conversion to fragmented form
forces this issue to be considered. Another issue is that "small clips" is
a very subjective thing and once we say non-fragmented MP4 is supported
people will expect it to work no matter how long the files are.


>
>
>  The fragmented form allows the application to easily select the desired
> segment and reappend it.
>
> If possible! If your fragments are large, the application won't be able to
> do it (at least not easily).
>

[acolwell] True, but that's further incentive to keep the segments
reasonable size. If the UA keeps evicting parts of the segments that is an
indication that the fragment size you are using is too big. At least with
fragmented files you have this parameter to tune. With non-fragmented files
you are simply out of luck and have to convert to fragmented form to
resolve it anyways.


>
>
>
>  Applications can control the level of duplicate appending by adjusting
> the fragment size appropriately.
>
> I think you mean 'Content provider can control ...'? Web applications
> should be able to use content from different sources, with no control over
> the content generation, so possibly using non-fragmented files.
>

[acolwell] If they intend to use it the content with MSE then they need to
be able to exert some sort of control. It may be that they have to convince
their partners to provide their assets in fragmented form. MSE already puts
constraints on what types of content can be spliced together so the
application needs to have at least some idea about what it is passing to
MSE to insure it doesn't violate any of the constraints such as no codec
changes, consistent track counts, etc.


>  Non-fragmented files are so permissive about how they can store samples,
> there is no simple way to collect segments of the timeline w/o essentially
> exposing a random access file API.
>
> Which exact difference in the non-fragmented (vs the fragmented storage)
> is problematic? For which situation? I don't understand what you mean by
> 'collect segments of the timeline'. Which entity would need to do that? The
> web application? the MSE implementation? the MP4 parser? It is certainly
> easy for the MP4 parser.
>

[acolwell] Non-fragmented files have a lot of options when it comes to how
the file is formatted. Is the moov at the beginning or end? Are the samples
stored in the mdat in order or are they randomly distributed? The list goes
on. This adds a lot of complexity and in the worst case requires the whole
file to be available to resolve. In fragmented files this complexity is
relatively bounded and the content author actually has to be proactive
about making sure the content is in the right format. Sure people can
create crazy fragmented files as well, but that is not nearly as common.


>
> In general, I think MSE can be viewed as an API to construct a playlist
> and have seamless playback of the elements of the playlist in an HTML5
> video element. There are complex playlist configurations with overlapping
> elements. I think the simple use case of seamlessly playing 2 MP4 files
> sequentially should be supported.
>

[acolwell] MSE is not an API to construct playlists and I don't think it is
good to think about it this way. If that was my goal then I would have
designed things very differently. MSE is an API to construct a presentation
from a set of media segments. Media segments do not necessarily imply fully
formed files in existing formats. Certain forms of existing file formats
are interpreted as media segments by MSE, but fully formed files are not
required to add media segments to the presentation. For example in the WebM
bytestream all you need to create is a Cluster element to add media to the
presentation. You don't need to create a fully formed WebM file. For ISO,
you only need a moof box followed by an mdat box to add media. I explicitly
wanted to break the constraints of requiring fully formed files because I
believe it would allow people to mashup content in ways that would be
difficult within the constraints of existing file formats.

[acolwell] Seemlessly playing 2 MP4 files sequentially is supported if you
first convert them to fragmented form. In my opinion it is better to have
the content author place the content in a specific form then to require all
UAs to have to deal with non-fragmented MP4 files.

Aaron

Received on Friday, 15 February 2013 17:59:37 UTC