Re: [MSE] non fragmented MP4 from guy paskar on 2013-02-20 (public-html-media@w3.org from February 2013)

From: guy paskar <guypaskar@gmail.com>
Date: Wed, 20 Feb 2013 15:30:43 +0200
To: Aaron Colwell <acolwell@google.com>
Cc: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CALCEMzEGycw-yNteLXbzkEpvh2c4pazSPcAerMsUSR=TLVJsfg@mail.gmail.com>
I think this is a very interesting topic and here are a couple of comments
from my point of view:

1) I understand that fmp4 is needed for MSE as it is today. The main
problem I see with it is that current browsers can't play natively the fmp4
(they have to download the whole file as opposed to regular mp4) and as a
result one who uses MSE for streaming videos still have to hold regular mp4
to support browsers that do not support MSE (and can't pseudo stream fmp4)
- is there a way to overcome this? will this be addressed? I think this is
a serious issue.

2) With relation to point 1, is it possible to make a regular mp4
fragmented on the fly with some kind of parser? i.e to still hold a regular
mp4 on the server and when needed convert it (by parts) to fmp4 on the
client - because fragmentation a regular mp4 is an "easy" task I thought it
might be possible. I know that the youtube guys intended to do something
similar in their demo. any comments on that?

Guy






On Fri, Feb 15, 2013 at 7:59 PM, Aaron Colwell <acolwell@google.com> wrote:

> Comments inline...
>
>
> On Fri, Feb 15, 2013 at 6:20 AM, Cyril Concolato <
> cyril.concolato@telecom-paristech.fr> wrote:
>
>>  Hi Aaron,
>>
>> Le 14/02/2013 23:35, Aaron Colwell a écrit :
>>
>> Hi Giuseppe,
>>
>>  There are no current plans to support non-fragmented MP4 files. One
>> thing to remember is that MSE  accepts byte streams and not files per se.
>> For MP4 we use the fragmented format because it allows segments of the
>> timeline to be appended easily and in any order.
>>
>> Appending segments in any order may not be so easy. The MP4 spec says in
>> the Movie Fragment Header Box definition:
>> "The movie fragment header contains a sequence number, as a safety check.
>> The sequence number usually starts at 1 and must increase for each movie
>> fragment in the file, in the order in which they occur. This allows readers
>> to verify integrity of the sequence; it is an error to construct a file
>> where the fragments are out of sequence."
>>
>> So if you implement MSE on top of a conformant MP4 reader, feeding
>> segment data as if they were consecutive in a 'virtual' file, this won't
>> work. Segments with a sequence number smaller than the one of the first
>> segment provided may be rejected. To make sure, the append happens
>> correctly with an unmodified MP4 parser, the MSE implementation will have
>> to parse each segment, check the sequence number and if needed reinitialize
>> the parser before feeding the out-of-order segment.
>>
>
> [acolwell] MSE ignores the sequence number. Again it is important to not
> think in terms of files, but in terms of a bytestream. MSE accepts an
> bytestream that looks very close to a fragmented ISOBMFF file, but it
> allows things that aren't compliant with the ISOBMFF spec. If one decides
> to use a conformat MP4 reader for an MSE implementation then they will have
> to relax parts of the validation checks to avoid rejecting bytestream
> constructs that are allowed by MSE. MSE accepts the ISOBMFF fragmented form
> because it is a relatively simple way to encapsulate segments of a larger
> presentation. The intent was never to support the fragmented file format,
> but rather something close enough to that form to make it easy to use
> segments of MP4 content to construct a presentation.
>
>
>>
>>
>>   Supporting non-fragmented files would require the UA to hold the whole
>> file in memory which could be very problematic on memory constrained
>> devices.
>>
>> Is there any requirement in MSE to keep the data in memory and not on
>> disk?
>>
>
> [acolwell] That really is beside the point. Sure disk could be used, but
> in the mobile or TV case that isn't likely to be an option. Having disk
> just delays the problem a little longer. Eventually the UA may need to
> evict part of the timeline covered by the file and you'd have to reappend
> the whole file again to get that region back.
>
>
>>
>>  If the UA decides to garbage collect part of the presentation timeline
>> to free up space for new appends it is not clear how the web application
>> could reappend the garbage collected regions without appending the whole
>> file again.
>>
>> You could tell the same thing about non-fragmented files if the fragment
>> is very long. Sure, you will more likely find large non-fragmented files
>> than fragmented files with large fragments but the problem is the same.
>> Small non-fragmented files (such as small clips, ads) should not be
>> excluded.
>>
>
> [acolwell] I agree, but it is much more likely for people to create large
> & long non-fragmented MP4 files than it is for people to create fragmented
> files with large fragments. Forcing the conversion to fragmented form
> forces this issue to be considered. Another issue is that "small clips" is
> a very subjective thing and once we say non-fragmented MP4 is supported
> people will expect it to work no matter how long the files are.
>
>
>>
>>
>>  The fragmented form allows the application to easily select the desired
>> segment and reappend it.
>>
>> If possible! If your fragments are large, the application won't be able
>> to do it (at least not easily).
>>
>
> [acolwell] True, but that's further incentive to keep the segments
> reasonable size. If the UA keeps evicting parts of the segments that is an
> indication that the fragment size you are using is too big. At least with
> fragmented files you have this parameter to tune. With non-fragmented files
> you are simply out of luck and have to convert to fragmented form to
> resolve it anyways.
>
>
>>
>>
>>
>>  Applications can control the level of duplicate appending by adjusting
>> the fragment size appropriately.
>>
>> I think you mean 'Content provider can control ...'? Web applications
>> should be able to use content from different sources, with no control over
>> the content generation, so possibly using non-fragmented files.
>>
>
> [acolwell] If they intend to use it the content with MSE then they need to
> be able to exert some sort of control. It may be that they have to convince
> their partners to provide their assets in fragmented form. MSE already puts
> constraints on what types of content can be spliced together so the
> application needs to have at least some idea about what it is passing to
> MSE to insure it doesn't violate any of the constraints such as no codec
> changes, consistent track counts, etc.
>
>
>>  Non-fragmented files are so permissive about how they can store
>> samples, there is no simple way to collect segments of the timeline w/o
>> essentially exposing a random access file API.
>>
>> Which exact difference in the non-fragmented (vs the fragmented storage)
>> is problematic? For which situation? I don't understand what you mean by
>> 'collect segments of the timeline'. Which entity would need to do that? The
>> web application? the MSE implementation? the MP4 parser? It is certainly
>> easy for the MP4 parser.
>>
>
> [acolwell] Non-fragmented files have a lot of options when it comes to how
> the file is formatted. Is the moov at the beginning or end? Are the samples
> stored in the mdat in order or are they randomly distributed? The list goes
> on. This adds a lot of complexity and in the worst case requires the whole
> file to be available to resolve. In fragmented files this complexity is
> relatively bounded and the content author actually has to be proactive
> about making sure the content is in the right format. Sure people can
> create crazy fragmented files as well, but that is not nearly as common.
>
>
>>
>> In general, I think MSE can be viewed as an API to construct a playlist
>> and have seamless playback of the elements of the playlist in an HTML5
>> video element. There are complex playlist configurations with overlapping
>> elements. I think the simple use case of seamlessly playing 2 MP4 files
>> sequentially should be supported.
>>
>
> [acolwell] MSE is not an API to construct playlists and I don't think it
> is good to think about it this way. If that was my goal then I would have
> designed things very differently. MSE is an API to construct a presentation
> from a set of media segments. Media segments do not necessarily imply fully
> formed files in existing formats. Certain forms of existing file formats
> are interpreted as media segments by MSE, but fully formed files are not
> required to add media segments to the presentation. For example in the WebM
> bytestream all you need to create is a Cluster element to add media to the
> presentation. You don't need to create a fully formed WebM file. For ISO,
> you only need a moof box followed by an mdat box to add media. I explicitly
> wanted to break the constraints of requiring fully formed files because I
> believe it would allow people to mashup content in ways that would be
> difficult within the constraints of existing file formats.
>
> [acolwell] Seemlessly playing 2 MP4 files sequentially is supported if you
> first convert them to fragmented form. In my opinion it is better to have
> the content author place the content in a specific form then to require all
> UAs to have to deal with non-fragmented MP4 files.
>
> Aaron
>
>
Received on Wednesday, 20 February 2013 13:31:36 UTC