Re: ACTION-84: Summarize the info from Jean Pierre, and mail it to the group [SMPTE MXF editUnit Spec]

On Thu, Aug 20, 2009 at 7:28 PM, Conrad Parker<conrad@metadecks.org> wrote:
> 2009/8/20 Davy Van Deursen <davy.vandeursen@ugent.be>:
>>>-----Original Message-----
>>>From: public-media-fragment-request@w3.org [mailto:public-media-
>>>fragment-request@w3.org] On Behalf Of Silvia Pfeiffer
>>>Sent: Thursday, August 20, 2009 5:24 AM
>>>To: erik mannens
>>>Cc: public-media-fragment@w3.org
>>>Subject: Re: ACTION-84: Summarize the info from Jean Pierre, and mail it
>>>to the group [SMPTE MXF editUnit Spec]
>>>
>>>Hi all,
>>>
>>>In yesterday's teleconference, Raphael pointed to this email with the
>>>question whether we should add a generic addressing scheme for
>>>"editUnit"s.
>>>
>>>I can see how editUnits are very important to container formats. In
>>>fact, in Ogg we have a thing called granules which represents the same
>>>concept. MXF has obviously defined editUnits as their word to use for
>>>this.
>>>
>>>editUnits are the finest granularity at which an encoded media stream
>>>can be dealt with without having to decode it. So, editUnits give us
>>>for a particular codec in a particular container format and with a
>>>particular encoding setting the lowest resolution that this resource
>>>is capable of delivering.
>
> is that really the definition? You can't deal with compressed audio
> data at a resolution smaller than the duration of an Ogg Vorbis packet
> or MP3 frame, without decode.
>
> You could of course deliver the packet containing the required audio
> sample and note the presentation time in skeleton or similar. So it's
> feasible but somewhat complex and requires container-format support.
>
>>>For example: a wav file generally encodes audio without compressing,
>>>so you can access each audio sample individually. If a particular file
>>>was encoded at 44100 Hz, then your editUnit resolution is in theory
>>>0.000023sec. A video stream. encoded in as mjpeg (just for simplicity)
>>>and at 25 fps would have a editUnit resolution of 0.04sec. This
>>>already poses another problem: what is the combined editUnit of a
>>>container that has both an audio and a video track?
>>>
>>>So, if we are considering addressing something by "t=editUnit:34,500"
>>>I cannot see how that will ever work.
>>>
>>>Firstly, the editUnit is dependent on the resource format and the
>>>particular encoding parameters used for that resource. If you
>>>re-encode the resource with, e.g. a higher temporal resolution, your
>>>editUnit changes and your URL is not correct any more.
>
> agree
>
>>>Then, I don't see how we can reconcile an editUnit across multiple
>>>track. For Ogg, we are keeping the editUnit information in the
>>>skeleton track for each audio, video, and annotation track separately.
>
> you would have to specify <track-id, editUnits>, at which point the
> server would translate that into a timestamp and deliver the
> corresponding offset for all tracks.
>
>>>Finally, what would such an addressing unit give a user or even a
>>>program? editUnit has no semantic meaning at all. It is a deeply
>>>codec-specific information that has no meaning outside the real of
>>>codecs.
>
> I agree that it is more complex than just specifying a time.
>
> I agree that it would not be useful for the general use-cases of a
> person publishing a video and providing URIs for people to watch parts
> of it.
>
> However it could be useful for a video editing application, to ask a
> remote server for a particular set of frames from a given track. If
> that use-case is part of our goals we should consider editUnits. Even
> then it may only make sense if used in combination with a track=
> selector, ie. only retrieving data from one track.

I would suggest to wait for the world to create a need for this. :-)
We can then still add it.

I think for now we have more than enough things to do - demo
implementations are really required next.

Cheers,
Silvia.

Received on Thursday, 20 August 2009 15:00:32 UTC