Re: Squid experts from Yves Lafon on 2008-11-06 (public-media-fragment@w3.org from November 2008)

From: Yves Lafon <ylafon@w3.org>
Date: Thu, 6 Nov 2008 05:20:35 -0500 (EST)
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
cc: Media Fragment <public-media-fragment@w3.org>
Message-ID: <Pine.LNX.4.64.0811060458570.25909@ubzre.j3.bet>
On Thu, 6 Nov 2008, Silvia Pfeiffer wrote:

> So where 5s ends is a question about making intervals inclusive or
> exclusive. Let's go with your understanding. The problem still exists.
> Let me try and explain. Also note that we are talking about Web
> proxies that do not have the full media resource at hand, while a Web
> server that has the full media resource at hand has no such issues.
>
> So let's assume two temporal fragment URIS. One asks for seconds 1-6
> and the next one asks for 6-12. The Web proxy is only give a
> colleciton of bytes and told which time segment these map to. It does
> not know which byte offset in the original file the map to. Now, we
> cannot be sure that the end of 1-6 is exactly before the beginning of
> 6-12. There may well be an overlap of bytes in between ending 1-6 and
> beginning 6-12, because we deal with the compressed domain and
> continuous time. In fact, I would be surprised if there is not an
> overlap in bytes by at least one codec packet. This tells us that only
> the bytes are uniquely identified, while time is not uniquely
> identifyable. This means we have to store not just time in the Web
> proxy, but also the mapping to bytes. With the 2-way handshake, I do
> not think this is possible (but please correct me if I'm wrong).

That is the thing I don't understand, why is it mandatory to have the 
mapping to bytes? You request fragment 1-6s, you get a fully playable 
video with this time range (and it may be 0.5s 6.5s). Now you get fragment 
6 to 10s, you get once again a fully playable video with this new time 
range (once again it may even overlap to 5.5 -> 10.5s).

Of course a blind proxy won't be able to do anything with it, but that's 
not the point. It should know that it's a video, how to navigate from 
frame to frame, align and merge, then recreate a container around the 
compressed content (as well as being able to act as a first class server 
when a client request a time range between the cached time range). No byte 
offset involved there.

Now if you want to have, on top of time ranges, the possibility of using 
byte ranges as well, then ok, you need to pass the information along, but 
it means you are starting to mix apples and oranges.

>>>> That said, we have different axes of selection,
>>>> and it doesn't fit well the range model.
>>>> I was wondering if language selection could be done using
>>>> Accept-Language,
>>>> in the case you have different language tracks, but in that case you need
>>>> to
>>>> identify first class URIs for the different language-based variants.
>>>
>>> When you mean language selection, you talk more generally about
>>> selecting different tracks, right? This could be both for audio and
>>> for annotations. As for video, we could also have recordings from
>>> different angles that could be selected or other tracks. Solving the
>>> language selection only solves one part of the track selection
>>> problem.
>>
>> Yes, and language selection can be applied to audio track but also for
>> subtitles, making things even worse. But the real issue is are they
>> fragments or not?
>
> And also we could have karaoke, image tracks, multiple audio tracks,
> multiple video angles. Getting only a certain subset of tracks from
> the original media resource can be solved through a fragment
> addressing scheme. Whether it is the right way to solve it, I am not
> sure either.
>
>
>>>> We need to discuss that a bit deeper, do we really need to identify the
>>>> video+fr soundtrack as a fragment?
>>>
>>> I don't understand "video+fr soundtrack"... what do you mean?
>>
>> a video consiting of "moving pictures + french soundtrack", does it need to
>> be presented as a fragment ? ie: what are the axis where it is useful to
>> define fragments.
>
> Ah ok. :-)
>
>
>>>>> Instead there was a suggestion to create a codec-independent media
>>>>> resource description format that would be a companion format for media
>>>>> resources and could be downloaded by a Web client before asking for
>>>>> any media content. With that, the Web client would easily be able to
>>>>> construct byte range requests from time range requests and could thus
>>>>> fully control the download. This would also mean that Web proxies
>>>>> would not require any changes. It's an interesting idea and I would
>>>>> want to discuss this in particular with Davy. Can such a format
>>>>> represent all of the following structural elements of a media
>>>>> resource:
>>>>> * time fragments
>>>>> * spatial fragments
>>>>> * tracks
>>>>> * named fragments.
>>>>
>>>> Well, you have byte ranges, but no headers, no metadata. And carrying
>>>> part
>>>> of the missing payload in headers is a big no.
>>>
>>> Can you explain this further? I don't quite understand what is the big
>>> no and which missing payload you're seeing to be put in which headers
>>> (HTTP headers?).
>>
>> If you are outputing only a byte range of the video, does it contains all
>> the needed informations to be played? (like format, framerate etc...)
>> If not, how to you carry the missing information (ie: the missing part of
>> the payload).
>
> So you are saying that even if we only have small changes to make to
> media headers (i.e. payload), it is a bad design to deliver these
> changes in HTTP headers? In the 4-way handshake proposal the changes
> are carried in the payload of the first handshake reply, which
> provides the updated media headers and the mapping of the fragment to
> byte ranges. I am not sure how to do that in a 2-way handshake other
> than through HTTP headers.
>
>
> Cheers,
> Silvia.
>

-- 
Baroula que barouleras, au tiéu toujou t'entourneras.

         ~~Yves
Received on Thursday, 6 November 2008 10:20:45 UTC