RE: Squid experts from Davy Van Deursen on 2008-11-01 (public-media-fragment@w3.org from November 2008)

From: Davy Van Deursen <davy.vandeursen@ugent.be>
Date: Sat, 1 Nov 2008 22:52:01 +0100
To: "'Media Fragment'" <public-media-fragment@w3.org>
Message-ID: <00a301c93c6c$1259a850$370cf8f0$@vandeursen@ugent.be>
Hi Silvia, all,

>-----Original Message-----
>From: public-media-fragment-request@w3.org [mailto:public-media-
>fragment-request@w3.org] On Behalf Of Silvia Pfeiffer
>Sent: Saturday, November 01, 2008 12:00 PM
>To: Media Fragment
>Subject: Re: Squid experts
>
>
>Hi Davy,
>
>That's a very clear statement on the possibilities of a abstract model
>of the structure-to-binary relationships of compressed media
>resources. I think you may be right and it's not easily possible, if
>not impossible, to do with all media types - even though
>multi-byte-ranges may help for some of them. Whether that's a killer
>for this approach, or whether we could still suggest this approach as
>an optimisation in certain cases, I don't know.

If we want to support caching of media fragments without modifying the
existing Web caches and proxies, then this will only work under the
following circumstances (and suppose multi-byte-ranges are possible):
1) the media fragments can be extracted in the compressed domain
2) no syntax element modifications in the bitstream are needed to perform
the extraction

Note that one workaround for point 2 is that the server sends the headers
with the modified syntax elements to the client (as you did with the Ogg
format). However, I don't think that this workaround will work in general
for each format. For example, extracting a spatial fragment from a Motion
JPEG2000 stream implies that syntax element modifications are necessary in
each JPEG2000 frame. The workaround only works when syntax element
modifications are needed in headers applying to the whole bitstream.

>
>Which formats did you find so far it was possible to gain a structure
>about? I can certainly say for Ogg that time fragments, tracks and
>named fragments when using CMML are all possible. For spatial
>fragments, I am not so sure - I'd think rather not...

Tracks
******
Whether tracks are supported or not depends on the container format. Since a
container format only defines a syntax and does not introduce any
compression, it is always possible to describe the structures of a container
format. Hence, if a container format allows the encapsulation of multiple
tracks, then it is possible to describe the tracks in terms of bytes.
Examples of such container formats are Ogg, MP4, ... Note that it is
possible that the tracks are multiplexed, implying that a description of one
track consists of a list of byte ranges. Also note that the extraction of
tracks (and fragments in general) from container formats often introduces
the necessity of syntax element modifications in the headers. 

Time fragments
**************
If time fragments are supported or not is in the first place dependent on
the coding format and more specifically how encoding parameters were set.
For video coding formats, time fragments can be extracted if the video
stream provides random access points (i.e., a point that is not dependent on
previously encoded video data, typically corresponding to an intra-coded
frame) on a regular basis. I think this is the same for audio coding formats
(I only have experience with AAC and MP3); the audio stream needs to be
accessed at a point where the decoder can start decoding without the need of
previously coded data.

Spatial fragments
*****************
This one is probably the hardest to deal with and depends, just like time
fragments, on the coding format. For image coding formats, JPEG2000 and HD
Photo (to some extent) provide support to independently encode spatial
regions. With other image coding formats such as JPEG, GIF, and PNG, it is
not possible to describe spatial regions in terms of bytes. For video coding
formats, we can consider the motion variants of the image coding formats
JPEG2000 and HD Photo. Further, H.264/AVC and its scalable extension SVC are
able to encode spatial regions independently by making use of the coding
tool Flexible Macroblock Ordering (FMO). MPEG-4 Visual allows to code
objects independently of each other (i.e., object-based video coding).
However, for video formats, I think that there are very few media resources
in the wild that were encoded with provisions for spatial fragment
extraction in the compressed domain. For example, only a few H.264/AVC
decoders support FMO and I never saw an MPEG-4 Visual encoded bitstream
which was encoded in objects.

Named fragments
***************
To the best of my knowledge, no coding format provides support for named
fragments. I think we should look at container formats for this feature. As
you said, Ogg combined with CMML does the trick. In fact, if a container
format allows the insertion of metadata describing the named fragments, then
the container format supports named fragments. For example, you can include
a CMML description in an MP4 container and interpret this CMML description
to extract fragments based on a name.


Finally, it is important to remark that coding formats supporting fragment
extraction in the compressed domain are not enough. The right encoding
parameters also need to be enabled to support this fragment extraction. For
example, it is impossible to extract a spatial fragment from a JPEG2000
image in the compressed domain if this spatial fragment is not independently
coded from the rest of the image.

Best regards,

Davy

-- 
Davy Van Deursen

Ghent University - IBBT
Department of Electronics and Information Systems Multimedia Lab
URL: http://multimedialab.elis.ugent.be
Received on Saturday, 1 November 2008 21:52:43 UTC