Re: Should we consider the user? from Jack Jansen on 2009-04-01 (public-media-fragment@w3.org from April 2009)

From: Jack Jansen <Jack.Jansen@cwi.nl>
Date: Wed, 1 Apr 2009 23:19:12 +0200
To: Raphaël Troncy <Raphael.Troncy@cwi.nl>
Cc: Media Fragment <public-media-fragment@w3.org>
Message-Id: <59D05693-B18C-4F2F-8DE5-33E70E03CA82@cwi.nl>
On 1 apr 2009, at 14:50, Raphaël Troncy wrote:

> I like Silvia's answer, http://lists.w3.org/Archives/Public/public-media-fragment/2009Apr/0004.html 
>  :-)
> To follow today's discussion:

Hmm. As your answers below are exactly what I mean by "a more user  
centric description" I think the perceived difference is purely a  
choice of words. So I suggest we just put these clarifications  
somewhere in the document and that's that.

>> there are a number of questions:
>> 1. Does this return the exact 1s-2s time fragment? This is probably  
>> impossible, the next best thing is the smallest interval that  
>> contains this interval, i.e. from the last frame before or at 1s to  
>> the first frame at or after 2s.
>> 2. Alternatively, we could say the user gets a "reasonable"  
>> interval around 1s-2s, so implementations can cut at I frames. This  
>> would forestall transcoding. But: now we need to define "reasonable".
>
> I would also favor the best effort approach, i.e. find the closest  
> interval that encapsulates the client request.

Agreed. We could add some non-normative text that implementations are  
expected to do something like find a random-access-point or I-frame or  
whatever the underlying format supports. We could also state that if  
the underlying format allows logical in- and out-points (like Ogg  
seems to do) these should be used to refine the interval.
>
>
>> 3. What about audio/video sync? If the user gets synced a/v we need  
>> to do recoding.
>
> Again, best effort and we need to find out when we have extreme un- 
> sync that will not be tolerate by the user.

Here we may need a bit more text. Again, for Ogg there is no issue (I  
assume) as it can use the logical in/out points to state where  
playback should start and stop (and I assume this can cater for A/V  
resyncing). For container formats that do not provide this  
functionality the client may have to do transcoding.

>> 4. What about timestamps in the media? Are these the originals?
>
> I haven't understood this question if it is not linked to the in- 
> context/out-context discussion.

Let me give an example. I download <http://www.example.com/example.mp4#t=100,200 
 > to a local file "myclip.mp4". Next, I mail off myclip.mp4 to you.  
You open it in a video editor. Assuming this video editor will show  
timestamps as they appear in the original media (as opposed to  
starting at "0" for every file), there are 3 possibilities:
a. We standardise that you will always see 100 as the timestamp of the  
first frame (or a number slightly lower, because of (1) above).
b. We standardise that you will always see 0.
c. We specifically state that this is up to the implementation.

I think the issue is important, because one the one hand (a) allows  
for recombining fragments and comparing fragment time ranges, but on  
the other hand (c) allows simpler implementations. (I see no reason to  
enforce (b), I'm listing it for completeness).

>> 5. What about spatial crops? Same questions as for (1) and (2)
>
> We leave undefined for now what the spatial cropping actually do :-)  
> We will need to write down what are the current state of our  
> thoughts: basically, spatial cropping requires transcoding for all  
> the current known codecs formats except (perhaps) Motion JPEG2000.


Agreed.
--
Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Goldman
Received on Wednesday, 1 April 2009 21:20:01 UTC