Re: ACTION-84: Summarize the info from Jean Pierre, and mail it to the group [SMPTE MXF editUnit Spec]

2009/8/20 Davy Van Deursen <davy.vandeursen@ugent.be>:
>>-----Original Message-----
>>From: public-media-fragment-request@w3.org [mailto:public-media-
>>fragment-request@w3.org] On Behalf Of Silvia Pfeiffer
>>Sent: Thursday, August 20, 2009 5:24 AM
>>To: erik mannens
>>Cc: public-media-fragment@w3.org
>>Subject: Re: ACTION-84: Summarize the info from Jean Pierre, and mail it
>>to the group [SMPTE MXF editUnit Spec]
>>
>>Hi all,
>>
>>In yesterday's teleconference, Raphael pointed to this email with the
>>question whether we should add a generic addressing scheme for
>>"editUnit"s.
>>
>>I can see how editUnits are very important to container formats. In
>>fact, in Ogg we have a thing called granules which represents the same
>>concept. MXF has obviously defined editUnits as their word to use for
>>this.
>>
>>editUnits are the finest granularity at which an encoded media stream
>>can be dealt with without having to decode it. So, editUnits give us
>>for a particular codec in a particular container format and with a
>>particular encoding setting the lowest resolution that this resource
>>is capable of delivering.

is that really the definition? You can't deal with compressed audio
data at a resolution smaller than the duration of an Ogg Vorbis packet
or MP3 frame, without decode.

You could of course deliver the packet containing the required audio
sample and note the presentation time in skeleton or similar. So it's
feasible but somewhat complex and requires container-format support.

>>For example: a wav file generally encodes audio without compressing,
>>so you can access each audio sample individually. If a particular file
>>was encoded at 44100 Hz, then your editUnit resolution is in theory
>>0.000023sec. A video stream. encoded in as mjpeg (just for simplicity)
>>and at 25 fps would have a editUnit resolution of 0.04sec. This
>>already poses another problem: what is the combined editUnit of a
>>container that has both an audio and a video track?
>>
>>So, if we are considering addressing something by "t=editUnit:34,500"
>>I cannot see how that will ever work.
>>
>>Firstly, the editUnit is dependent on the resource format and the
>>particular encoding parameters used for that resource. If you
>>re-encode the resource with, e.g. a higher temporal resolution, your
>>editUnit changes and your URL is not correct any more.

agree

>>Then, I don't see how we can reconcile an editUnit across multiple
>>track. For Ogg, we are keeping the editUnit information in the
>>skeleton track for each audio, video, and annotation track separately.

you would have to specify <track-id, editUnits>, at which point the
server would translate that into a timestamp and deliver the
corresponding offset for all tracks.

>>Finally, what would such an addressing unit give a user or even a
>>program? editUnit has no semantic meaning at all. It is a deeply
>>codec-specific information that has no meaning outside the real of
>>codecs.

I agree that it is more complex than just specifying a time.

I agree that it would not be useful for the general use-cases of a
person publishing a video and providing URIs for people to watch parts
of it.

However it could be useful for a video editing application, to ask a
remote server for a particular set of frames from a given track. If
that use-case is part of our goals we should consider editUnits. Even
then it may only make sense if used in combination with a track=
selector, ie. only retrieving data from one track.

Conrad.

Received on Thursday, 20 August 2009 09:28:57 UTC