Re: Squid experts from Silvia Pfeiffer on 2008-11-06 (public-media-fragment@w3.org from November 2008)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 6 Nov 2008 22:08:01 +1100
To: "Davy Van Deursen" <davy.vandeursen@ugent.be>
Cc: "Jack Jansen" <Jack.Jansen@cwi.nl>, "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <2c0e02830811060308y7a87c9e0j3e2031e0a24598a5@mail.gmail.com>
Hi Davy,

On Thu, Nov 6, 2008 at 8:08 PM, Davy Van Deursen
<davy.vandeursen@ugent.be> wrote:
> Let me clarify my view on this topic. Suppose we use byte ranges to cache
> our media fragments (using the four-way handshake approach), then we can
> distinguish the following scenarios:
>
> 1. The media resource meets the two requirements (i.e., fragments can be
> extracted in the compressed domain and no syntax element modifications are
> necessary).
> -> we can cache media fragments of such media resources, because their media
> fragments are addressable in terms of byte ranges.

Agreed.


> 2. Media fragments cannot be extracted in the compressed domain
> -> transcoding operations are necessary to extract media fragments from such
> media resources; these media fragments are not expressible in terms of byte
> ranges. Hence, it is not possible to cache these media fragments.

Agreed.


> 3. Media fragments can be extracted in the compressed domain, but syntax
> element modifications are required
> -> these media fragments seem to be cacheable :-). For instance, headers
> containing modified syntax elements could be sent in the first response of
> the server (as already discussed on this list). However, the latter solution
> is still not entirely clear for me. What if for example multiple headers are
> changed and these headers are spread across the media resource? What if
> syntax element modifications are needed in parts that do not apply to the
> whole media resource? I still don't have the feeling that this solution is
> generically applicable to all media resources in this scenario.

You are describing a hypothetical codec and throwing all possible
complexities at it. I think what we need to do instead is to actually
analyse real encapsulation and compression formats to really
understand the different situations that we are dealing with. There
may well be codecs for which this doesn't work. So we have to retrace
to scenario 2. I don't think we can deal with one situation only.



> Suppose we use time ranges to cache our media fragments, then I see the
> following pros and contras:
>
> Pro:
> -> caching will work for the three above described scenarios (i.e., for
> fragments extracted in the compressed domain and for transcoded fragments).
> Hence, the way of caching is independent of the underlying formats and
> adaptation operations performed by the server.

I disagree. Scenario 2 cannot cache resources in the way that we
describe it - with all possibilities of concatenation and
recomposition. Scenario 2 can only cache individual fragments, not the
full resource, since each time fragment will consist of different byte
values than the original resource. Therefore, caching in Web proxies
doesn't really work any longer. Caching will only work for scenario 1
and 3.

> -> four-way handshake can be avoided.

That's a fair enough aim and I'd like to believe we can achieve it.
But it may be too hard.


> Contra:
> -> no support for spatial fragments.

Why? Spatial fragments would just get spatial ranges for caching.


> -> concatenation of fragments becomes a very difficult and in some cases
> maybe an impossible task. To be able to join media fragments, the cache
> needs a perfect mapping of the bytes and timestamps for each fragment.
> Furthermore, if we want to keep the cache format-independent, such a mapping
> is not enough. We also need information regarding the byte positions of
> random access points and their corresponding timestamps. This way, a cache
> can determine which parts are overlapping when joining two fragments. Note
> that this kind of information could be modeled by a codec-independent
> resource description format.

Yes, I think with such a representation of the resource and with the
server sharing this representation with all the proxies, we should be
able to do time ranges with a 2-way-handshake only. Is it realistic
though to create such an overhead in the protocol?


> Of course, this works only when it is possible
> to extract the media fragments in the compressed domain. For joining
> fragments which are the result of transcoding operations, transcoders are
> needed in the cache. As you said, the latter operation could introduce a
> loss of quality, but note that this is always possible with transcoding
> operations (thus, also if they are transcoded at server-side).

I am not even sure we should include any kind of transcoding
activities into our model. Transcoding creates fundamentally different
representations for the same media resource, and mostly with a loss of
quality. I personally don't think we should go down that path.

> Both approaches have pros and contras and for the moment, I don't prefer one
> over the other. What do you think?

Thanks for making them explicit - that makes the discussion easier. :)

I think we are theorizing a lot and are not actually looking at
concrete codecs. We should start getting our hands dirty. ;-) By which
I mean: start classifying the different codecs according to the
criteria that you have listed above and find out for which we are
actually able to do fragments and what types of fragments.

Cheers,
Silvia.
Received on Thursday, 6 November 2008 11:14:49 UTC