W3C home > Mailing lists > Public > public-html-media@w3.org > February 2014

Re: MSE byte-stream format initialization segment boxes

From: David Singer <singer@apple.com>
Date: Wed, 19 Feb 2014 11:41:41 -0800
Cc: "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-id: <94E65FB8-1074-4B68-A10E-A1DD8699E679@apple.com>
To: Aaron Colwell <acolwell@google.com>
Hi, thanks, inline

On Feb 18, 2014, at 13:13 , Aaron Colwell <acolwell@google.com> wrote:

> 
> 
> On Tue, Feb 18, 2014 at 9:54 AM, David Singer <singer@apple.com> wrote:
> Hi guys
> 
> there is a sentence in
> 
> http://www.w3.org/2013/12/byte-stream-format-registry/isobmff-byte-stream-format.html#iso-init-segments
> 
> which is causing us some problems, notably the Ďignore' here:
> 
> > Valid top-level boxes such as ftyp, styp, and sidx are allowed to appear before the moov box.
> > These boxes must be accepted and ignored by the user agent and are not considered part of the initialization segment in this specification.
> 
> This is causing some implementations to strip these boxes before they get to the media engine, and then weíve lost important compatibility information (notably the claims of compatibility made by the ftyp and styp boxes) and, if we want to index, the indexing information.  I suppose they think weíre going to conform to the apparent requirement (though itís expressed as a statement of fact, rather than as an option or requirement) to ignore.  But why?
> 
> How would the styp & ftyp boxes change how the byte streams are handled? They don't appear to provide much value in the MSE byte stream context.

They provide a lot of value to the media engine;  pre-flighting whether or how to play, for example.  In one example, we require ftyp boxes on both movie (QT) files and MP4 files, and the engine uses the ftyp box to handle a few places where these must be interpreted differently.  Even without this case, it really helps to get early warning of possible problems.  Yes, the DASH MPD manifest should have quoted the file-type brands in the profiles parameter of the MIME type, so playability could be determined then, but if it didnít, at least one gets a warning to the media engine before one encounters an unsupported feature.

> I don't think they should be required as part of the initialization segment especially if it introduced a restriction like, all ftyp/styp boxes must have the same major_brand, minor_version, and compatible_brands.

They are required in both the ISO base media file format, and in the DASH specification, for good reasons.  I donít think itís the place of MSE to try to relax requirements in the specs it is building on.

> The sidx box contains file specific offset info. MSE does not require a full segment file to be appended so interpreting a sidx box is never guaranteed to be correct. MSE intentionally has no concept of file boundaries so, as far as I can tell, there is no way for an implementation to determine if the fragments that follow a sidx box are actually the ones the sidx box refers to. I intentionally wanted to break the "append a full file" requirement to provide maximum flexibility in presentation construction.

OK, the segment index is something I donít use.

But the segment type box then is important; it allows you to notice that you have a segment in your hand that has differently compatibility claims than the initialization segment you are operating under.  Maybe something went wrong and youíre not in Kansas any more.

> Should this be re-phrased?
> 
> These boxes *are* considered part of the initialization segment in this specification and must be accepted and passed by the user-agent to the media engine; they may be ignored or processed as desired.
> 
> I don't think we should use this wording for the reasons given above. In my opinion we should definitely ignore any top-level boxes that have file relative offsets in them or ones that make any assumptions about byte stream layout.

OK, the ftyp/styp boxes have neither of these issues.

> It is really nice that the moov contains all the configuration info and the moof+mdat pairs contain the media data. This minimizes the amount of constraints applied to what can be appended.
>  
> 
> 
> Can someone explain why itís written the way it is?
> 
> I hope my explanations above help.
> 
> Aaron
>  
> 
> 
> David Singer
> Multimedia and Software Standards, Apple Inc.
> 
> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.
Received on Wednesday, 19 February 2014 19:42:32 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 15:48:44 UTC