Re: video use-case from Jack Jansen on 2008-10-01 (public-media-fragment@w3.org from October 2008)

From: Jack Jansen <Jack.Jansen@cwi.nl>
Date: Wed, 1 Oct 2008 23:43:26 +0200
To: Yannick Prié <yannick.prie@liris.cnrs.fr>
Cc: Media Fragment <public-media-fragment@w3.org>
Message-Id: <18BAC73F-74BE-4490-B1A9-D55532B8F410@cwi.nl>

On  1-Oct-2008, at 23:13 , Yannick Prié wrote:
>>  But this triggered another question: are we interested in the  
>> timestamps in the movie? If we ask for a segment of video starting  
>> at 30s, do we expect the timestamp of the first frame to be "30s"?  
>> Do we expect it to be "0s"? Do we expect nothing at all? This is  
>> going to be important for client-side creation of URLs for  
>> selecting subparts of videos.
>
> I can consider a fragment of a video as a video, and I do not care  
> about the architecture that allows me to see it, download it, etc.  
> as a video. It begins at 0s.

Unfortunately the sentence "It begins at 0s" is ambiguous, and not  
even always true. The statement may mean "I expect that the first  
timestamp I see is 0s", which is true for some video formats. It may  
mean "I expect that the first timestamp I see has an offset of 0s from  
what the video header told me was the beginning timestamp of the  
video", which is true for some other video formats. And it may be  
untrue completely, in the case of live broadcasts.

> I can consider a fragment of a video as a fragment of a video. In  
> that case it begins at 30s, I can explicitly manipulate both the  
> fragment and the video (e.g. jump to a frame before the fragment  
> beginning, let's say at 20s).

This is, in my view, a completely different issue. I tend to think of  
this as referring to the fragment "in context", whereas the previous  
use case was referring to the fragment "out of context". But: these  
are terms we're using internally, if anyone has better/official terms:  
please let me know.

An analogy outside of the video domain (where we actually first  
started considering this) is digital talking books for the blind.  
Think of an audio file with markers that index into an accompanying  
HTML document. While playing the audio fragment you want to render the  
corresponding text. However, the interpretation of "text.html#id12345"  
depends on how you are going to render it:

1) If you're sending it to a braille display you want only the content  
of the node referred to by the ID, and show that on the braille display.
2) If you are sending it to a normal display (for people with limited  
vision or dyslexia) you want to render the whole page and only scroll/ 
highlight the selected area.

The first use case is out-of-context, the second in-context. Often, as  
in this example, the choice can only be made by the user agent. And  
making the wrong choice either leads to horrible inefficiency  
(architecture providing 2 when 1 is needed) or a really bad user  
experience (architecture providing only 2 when 1 is needed).

> I think both cases should be considered. I do not manipulate a  
> "video" and a "fragment of a video" in the same way, even if their  
> playing can result in the same rendering (eg. 5 seconds of video in  
> a web page).

It should be clear from the previous that I wholeheartedly agree:-)

--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Goldman

Received on Wednesday, 1 October 2008 21:44:07 UTC