RE: Squid experts from Davy Van Deursen on 2008-11-06 (public-media-fragment@w3.org from November 2008)

From: Davy Van Deursen <davy.vandeursen@ugent.be>
Date: Thu, 6 Nov 2008 10:08:21 +0100
To: "'Silvia Pfeiffer'" <silviapfeiffer1@gmail.com>
Cc: "'Jack Jansen'" <Jack.Jansen@cwi.nl>, "'Media Fragment'" <public-media-fragment@w3.org>
Message-ID: <008101c93fef$37a45520$a6ecff60$@vandeursen@ugent.be>
Hi Silvia, Jack, all,

>-----Original Message-----
>From: public-media-fragment-request@w3.org [mailto:public-media-
>fragment-request@w3.org] On Behalf Of Silvia Pfeiffer
>Sent: Wednesday, November 05, 2008 11:32 PM
>To: Davy Van Deursen
>Cc: Jack Jansen; Media Fragment
>Subject: Re: Squid experts
>
>
>Hi Davy, Jack, all,
>
>On Tue, Nov 4, 2008 at 1:00 AM, Davy Van Deursen
><davy.vandeursen@ugent.be> wrote:
>>>From: public-media-fragment-request@w3.org [mailto:public-media-
>>>fragment-request@w3.org] On Behalf Of Jack Jansen
>>On 1 nov 2008, at 22:52, Davy Van Deursen wrote:
>
>>>> If we want to support caching of media fragments without modifying
>the
>>>> existing Web caches and proxies, then this will only work under the
>>>> following circumstances (and suppose multi-byte-ranges are
>possible):
>>>> 1) the media fragments can be extracted in the compressed domain
>>>> 2) no syntax element modifications in the bitstream are needed to
>>>> perform
>>>> the extraction
>>>>
>>>
>>>Davy,
>>>I would think that any format that can be streamed over RTSP must meet
>>>these 2 requirements: otherwise the RTSP server would have to do
>>>recoding on the fly when the user agent does a seek.
>>
>> I do not agree that these two points are requirements to stream media
>> fragments over RTSP in general. On server side, syntax element
>modifications
>> can be performed on the fly. Also, if a server has some transcoders at
>its
>> disposal, on the fly transcoding of the media fragments is also
>possible, or
>> not? As long as you stay on the server side, all possible operations
>can be
>> performed to obtain the requested fragments. However, when proxies
>need to
>> be taken into account (and we do not want to modify them), you should
>be
>> able to express these server-side adaptation operations in terms of
>byte
>> ranges. In order to express an adaptation operation in terms of byte
>ranges,
>> the two above described requirements are necessary.
>
>I am not sure how this works with RTSP. However, we cannot do
>transcoding on the fly for media resources on HTTP. Transcoded
>resources are not bit-wise identical to the original media resource
>and can therefore not be cached in Web proxies. Then the media
>resource becomes identical to an output of a script that is different
>each time and therefore not a cachable resource. So, I'd prefer it if
>we could accept those two conditions.
>
>
>>>If we limit support for temporal fragments to media that meet
>>>requirements (1) and (2), and if RTSP has the same requirements, it
>>>seems that it isn't really limiting our support. Moreover, if the
>>>server does support fragments on media items that need to be recoded
>>>it is in the position to forestall caching by sending out rttp reply
>>>headers to that effect, or not?
>>
>> Do you mean that in the case of recoding, we do not cache the fragment
>at
>> all? Of course, if we do not make any changes on the caches, then we
>are
>> stuck to these byte ranges and is indeed the caching of a recoded
>fragment
>> impossible. However, it would be nice if the cache really understands
>the
>> concept of a fragment and does not (only) rely on byte ranges to cache
>the
>> fragment. But that, I think, will not be an easy task :-).
>
>Even if the cache understands codecs, how can it deal with several
>fragments that each have been recoded to meet the fragment needs? I
>cannot easily serve them by concatenation. It would need to do another
>recoding - and this time not from one file, but from multiple recoded
>fragments. Do you really think that is possible? And without loss of
>quality??

Let me clarify my view on this topic. Suppose we use byte ranges to cache
our media fragments (using the four-way handshake approach), then we can
distinguish the following scenarios:

1. The media resource meets the two requirements (i.e., fragments can be
extracted in the compressed domain and no syntax element modifications are
necessary).
-> we can cache media fragments of such media resources, because their media
fragments are addressable in terms of byte ranges.

2. Media fragments cannot be extracted in the compressed domain
-> transcoding operations are necessary to extract media fragments from such
media resources; these media fragments are not expressible in terms of byte
ranges. Hence, it is not possible to cache these media fragments.

3. Media fragments can be extracted in the compressed domain, but syntax
element modifications are required
-> these media fragments seem to be cacheable :-). For instance, headers
containing modified syntax elements could be sent in the first response of
the server (as already discussed on this list). However, the latter solution
is still not entirely clear for me. What if for example multiple headers are
changed and these headers are spread across the media resource? What if
syntax element modifications are needed in parts that do not apply to the
whole media resource? I still don't have the feeling that this solution is
generically applicable to all media resources in this scenario.


Suppose we use time ranges to cache our media fragments, then I see the
following pros and contras:

Pro:
-> caching will work for the three above described scenarios (i.e., for
fragments extracted in the compressed domain and for transcoded fragments).
Hence, the way of caching is independent of the underlying formats and
adaptation operations performed by the server.
-> four-way handshake can be avoided.

Contra:
-> no support for spatial fragments. 
-> concatenation of fragments becomes a very difficult and in some cases
maybe an impossible task. To be able to join media fragments, the cache
needs a perfect mapping of the bytes and timestamps for each fragment.
Furthermore, if we want to keep the cache format-independent, such a mapping
is not enough. We also need information regarding the byte positions of
random access points and their corresponding timestamps. This way, a cache
can determine which parts are overlapping when joining two fragments. Note
that this kind of information could be modeled by a codec-independent
resource description format. Of course, this works only when it is possible
to extract the media fragments in the compressed domain. For joining
fragments which are the result of transcoding operations, transcoders are
needed in the cache. As you said, the latter operation could introduce a
loss of quality, but note that this is always possible with transcoding
operations (thus, also if they are transcoded at server-side).


Both approaches have pros and contras and for the moment, I don't prefer one
over the other. What do you think?

Best regards,

Davy

-- 
Davy Van Deursen

Ghent University - IBBT
Department of Electronics and Information Systems Multimedia Lab
URL: http://multimedialab.elis.ugent.be
Received on Thursday, 6 November 2008 09:09:31 UTC