Re: MSE byte-stream format initialization segment boxes

On Tue, Feb 18, 2014 at 9:54 AM, David Singer <singer@apple.com> wrote:

> Hi guys
>
> there is a sentence in
>
>
> http://www.w3.org/2013/12/byte-stream-format-registry/isobmff-byte-stream-format.html#iso-init-segments
>
> which is causing us some problems, notably the ‘ignore' here:
>
> > Valid top-level boxes such as ftyp, styp, and sidx are allowed to appear
> before the moov box.
> > These boxes must be accepted and ignored by the user agent and are not
> considered part of the initialization segment in this specification.
>
> This is causing some implementations to strip these boxes before they get
> to the media engine, and then we’ve lost important compatibility
> information (notably the claims of compatibility made by the ftyp and styp
> boxes) and, if we want to index, the indexing information.  I suppose they
> think we’re going to conform to the apparent requirement (though it’s
> expressed as a statement of fact, rather than as an option or requirement)
> to ignore.  But why?
>

How would the styp & ftyp boxes change how the byte streams are handled?
They don't appear to provide much value in the MSE byte stream context. I
don't think they should be required as part of the initialization segment
especially if it introduced a restriction like, all ftyp/styp boxes must
have the same major_brand, minor_version, and compatible_brands.

The sidx box contains file specific offset info. MSE does not require a
full segment file to be appended so interpreting a sidx box is never
guaranteed to be correct. MSE intentionally has no concept of file
boundaries so, as far as I can tell, there is no way for an implementation
to determine if the fragments that follow a sidx box are actually the ones
the sidx box refers to. I intentionally wanted to break the "append a full
file" requirement to provide maximum flexibility in presentation
construction.


>
> Should this be re-phrased?
>
> These boxes *are* considered part of the initialization segment in this
> specification and must be accepted and passed by the user-agent to the
> media engine; they may be ignored or processed as desired.
>

I don't think we should use this wording for the reasons given above. In my
opinion we should definitely ignore any top-level boxes that have file
relative offsets in them or ones that make any assumptions about byte
stream layout. It is really nice that the moov contains all the
configuration info and the moof+mdat pairs contain the media data. This
minimizes the amount of constraints applied to what can be appended.


>
>
> Can someone explain why it’s written the way it is?
>

I hope my explanations above help.

Aaron


>
>
> David Singer
> Multimedia and Software Standards, Apple Inc.
>
>
>

Received on Tuesday, 18 February 2014 21:14:10 UTC