Re: [MSE] non fragmented MP4 from Aaron Colwell on 2013-02-20 (public-html-media@w3.org from February 2013)

From: Aaron Colwell <acolwell@google.com>
Date: Wed, 20 Feb 2013 09:41:33 -0800
To: guy paskar <guypaskar@gmail.com>
Cc: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CAA0c1bA3FF=7ZZ7_m-QOskC+xM+5a9eiPDMj-LHkM8wxiOV17Q@mail.gmail.com>
Comments inline...


On Wed, Feb 20, 2013 at 5:30 AM, guy paskar <guypaskar@gmail.com> wrote:

> I think this is a very interesting topic and here are a couple of comments
> from my point of view:
>
> 1) I understand that fmp4 is needed for MSE as it is today. The main
> problem I see with it is that current browsers can't play natively the fmp4
> (they have to download the whole file as opposed to regular mp4) and as a
> result one who uses MSE for streaming videos still have to hold regular mp4
> to support browsers that do not support MSE (and can't pseudo stream fmp4)
> - is there a way to overcome this? will this be addressed? I think this is
> a serious issue.
>

[acolwell] I don't view this as any different then having to support
multiple files because browsers support different codecs & resolutions. If
browsers want to ease this particular pain for content providers then they
can implement support for fragmented mp4 in the standard HTML5 video path.
I have a feeling this will happen naturally as fragmented mp4 files become
more common on the Internet because of MPEG-DASH and MSE.


>
> 2) With relation to point 1, is it possible to make a regular mp4
> fragmented on the fly with some kind of parser? i.e to still hold a regular
> mp4 on the server and when needed convert it (by parts) to fmp4 on the
> client - because fragmentation a regular mp4 is an "easy" task I thought it
> might be possible. I know that the youtube guys intended to do something
> similar in their demo. any comments on that?
>

I don't see any reason you couldn't do this in JavaScript on the client. I
think it depends on the application whether this path is the preferred
option or not.

Aaron


>
> Guy
>
>
>
>
>
>
> On Fri, Feb 15, 2013 at 7:59 PM, Aaron Colwell <acolwell@google.com>wrote:
>
>> Comments inline...
>>
>>
>> On Fri, Feb 15, 2013 at 6:20 AM, Cyril Concolato <
>> cyril.concolato@telecom-paristech.fr> wrote:
>>
>>>  Hi Aaron,
>>>
>>> Le 14/02/2013 23:35, Aaron Colwell a écrit :
>>>
>>> Hi Giuseppe,
>>>
>>>  There are no current plans to support non-fragmented MP4 files. One
>>> thing to remember is that MSE  accepts byte streams and not files per se.
>>> For MP4 we use the fragmented format because it allows segments of the
>>> timeline to be appended easily and in any order.
>>>
>>> Appending segments in any order may not be so easy. The MP4 spec says in
>>> the Movie Fragment Header Box definition:
>>> "The movie fragment header contains a sequence number, as a safety
>>> check. The sequence number usually starts at 1 and must increase for each
>>> movie fragment in the file, in the order in which they occur. This allows
>>> readers to verify integrity of the sequence; it is an error to construct a
>>> file where the fragments are out of sequence."
>>>
>>> So if you implement MSE on top of a conformant MP4 reader, feeding
>>> segment data as if they were consecutive in a 'virtual' file, this won't
>>> work. Segments with a sequence number smaller than the one of the first
>>> segment provided may be rejected. To make sure, the append happens
>>> correctly with an unmodified MP4 parser, the MSE implementation will have
>>> to parse each segment, check the sequence number and if needed reinitialize
>>> the parser before feeding the out-of-order segment.
>>>
>>
>> [acolwell] MSE ignores the sequence number. Again it is important to not
>> think in terms of files, but in terms of a bytestream. MSE accepts an
>> bytestream that looks very close to a fragmented ISOBMFF file, but it
>> allows things that aren't compliant with the ISOBMFF spec. If one decides
>> to use a conformat MP4 reader for an MSE implementation then they will have
>> to relax parts of the validation checks to avoid rejecting bytestream
>> constructs that are allowed by MSE. MSE accepts the ISOBMFF fragmented form
>> because it is a relatively simple way to encapsulate segments of a larger
>> presentation. The intent was never to support the fragmented file format,
>> but rather something close enough to that form to make it easy to use
>> segments of MP4 content to construct a presentation.
>>
>>
>>>
>>>
>>>   Supporting non-fragmented files would require the UA to hold the
>>> whole file in memory which could be very problematic on memory constrained
>>> devices.
>>>
>>> Is there any requirement in MSE to keep the data in memory and not on
>>> disk?
>>>
>>
>> [acolwell] That really is beside the point. Sure disk could be used, but
>> in the mobile or TV case that isn't likely to be an option. Having disk
>> just delays the problem a little longer. Eventually the UA may need to
>> evict part of the timeline covered by the file and you'd have to reappend
>> the whole file again to get that region back.
>>
>>
>>>
>>>  If the UA decides to garbage collect part of the presentation timeline
>>> to free up space for new appends it is not clear how the web application
>>> could reappend the garbage collected regions without appending the whole
>>> file again.
>>>
>>> You could tell the same thing about non-fragmented files if the fragment
>>> is very long. Sure, you will more likely find large non-fragmented files
>>> than fragmented files with large fragments but the problem is the same.
>>> Small non-fragmented files (such as small clips, ads) should not be
>>> excluded.
>>>
>>
>> [acolwell] I agree, but it is much more likely for people to create large
>> & long non-fragmented MP4 files than it is for people to create fragmented
>> files with large fragments. Forcing the conversion to fragmented form
>> forces this issue to be considered. Another issue is that "small clips" is
>> a very subjective thing and once we say non-fragmented MP4 is supported
>> people will expect it to work no matter how long the files are.
>>
>>
>>>
>>>
>>>  The fragmented form allows the application to easily select the
>>> desired segment and reappend it.
>>>
>>> If possible! If your fragments are large, the application won't be able
>>> to do it (at least not easily).
>>>
>>
>> [acolwell] True, but that's further incentive to keep the segments
>> reasonable size. If the UA keeps evicting parts of the segments that is an
>> indication that the fragment size you are using is too big. At least with
>> fragmented files you have this parameter to tune. With non-fragmented files
>> you are simply out of luck and have to convert to fragmented form to
>> resolve it anyways.
>>
>>
>>>
>>>
>>>
>>>  Applications can control the level of duplicate appending by adjusting
>>> the fragment size appropriately.
>>>
>>> I think you mean 'Content provider can control ...'? Web applications
>>> should be able to use content from different sources, with no control over
>>> the content generation, so possibly using non-fragmented files.
>>>
>>
>> [acolwell] If they intend to use it the content with MSE then they need
>> to be able to exert some sort of control. It may be that they have to
>> convince their partners to provide their assets in fragmented form. MSE
>> already puts constraints on what types of content can be spliced together
>> so the application needs to have at least some idea about what it is
>> passing to MSE to insure it doesn't violate any of the constraints such as
>> no codec changes, consistent track counts, etc.
>>
>>
>>>  Non-fragmented files are so permissive about how they can store
>>> samples, there is no simple way to collect segments of the timeline w/o
>>> essentially exposing a random access file API.
>>>
>>> Which exact difference in the non-fragmented (vs the fragmented storage)
>>> is problematic? For which situation? I don't understand what you mean by
>>> 'collect segments of the timeline'. Which entity would need to do that? The
>>> web application? the MSE implementation? the MP4 parser? It is certainly
>>> easy for the MP4 parser.
>>>
>>
>> [acolwell] Non-fragmented files have a lot of options when it comes to
>> how the file is formatted. Is the moov at the beginning or end? Are the
>> samples stored in the mdat in order or are they randomly distributed? The
>> list goes on. This adds a lot of complexity and in the worst case requires
>> the whole file to be available to resolve. In fragmented files this
>> complexity is relatively bounded and the content author actually has to be
>> proactive about making sure the content is in the right format. Sure people
>> can create crazy fragmented files as well, but that is not nearly as common.
>>
>>
>>>
>>> In general, I think MSE can be viewed as an API to construct a playlist
>>> and have seamless playback of the elements of the playlist in an HTML5
>>> video element. There are complex playlist configurations with overlapping
>>> elements. I think the simple use case of seamlessly playing 2 MP4 files
>>> sequentially should be supported.
>>>
>>
>> [acolwell] MSE is not an API to construct playlists and I don't think it
>> is good to think about it this way. If that was my goal then I would have
>> designed things very differently. MSE is an API to construct a presentation
>> from a set of media segments. Media segments do not necessarily imply fully
>> formed files in existing formats. Certain forms of existing file formats
>> are interpreted as media segments by MSE, but fully formed files are not
>> required to add media segments to the presentation. For example in the WebM
>> bytestream all you need to create is a Cluster element to add media to the
>> presentation. You don't need to create a fully formed WebM file. For ISO,
>> you only need a moof box followed by an mdat box to add media. I explicitly
>> wanted to break the constraints of requiring fully formed files because I
>> believe it would allow people to mashup content in ways that would be
>> difficult within the constraints of existing file formats.
>>
>> [acolwell] Seemlessly playing 2 MP4 files sequentially is supported if
>> you first convert them to fragmented form. In my opinion it is better to
>> have the content author place the content in a specific form then to
>> require all UAs to have to deal with non-fragmented MP4 files.
>>
>> Aaron
>>
>>
>
>
Received on Wednesday, 20 February 2013 17:42:01 UTC