Re: Should we consider the user? from Silvia Pfeiffer on 2009-04-01 (public-media-fragment@w3.org from April 2009)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Thu, 2 Apr 2009 09:01:09 +1100
To: Jack Jansen <Jack.Jansen@cwi.nl>
Cc: Raphaël Troncy <Raphael.Troncy@cwi.nl>, Media Fragment <public-media-fragment@w3.org>
Message-ID: <2c0e02830904011501w79c1452dhe0fa9bb4af9edd0b@mail.gmail.com>
Hi Jack,

2009/4/2 Jack Jansen <Jack.Jansen@cwi.nl>:
>
> On 1 apr 2009, at 14:50, Raphaël Troncy wrote:
>
>> I like Silvia's answer, http://lists.w3.org/Archives/Public/public-media-fragment/2009Apr/0004.html :-)
>> To follow today's discussion:
>
> Hmm. As your answers below are exactly what I mean by "a more user centric description" I think the perceived difference is purely a choice of words. So I suggest we just put these clarifications somewhere in the document and that's that.

I don't think we were disagreeing. :) I just wanted to put some
clarifications forward as a reply to your questions. And yes, you are
right, we need to  add these to the WD. Maybe we can have a section on
"implementation hints"? You might want to start adding a section to
the wiki that collects these things.


>>> there are a number of questions:
>>> 1. Does this return the exact 1s-2s time fragment? This is probably impossible, the next best thing is the smallest interval that contains this interval, i.e. from the last frame before or at 1s to the first frame at or after 2s.
>>> 2. Alternatively, we could say the user gets a "reasonable" interval around 1s-2s, so implementations can cut at I frames. This would forestall transcoding. But: now we need to define "reasonable".
>>
>> I would also favor the best effort approach, i.e. find the closest interval that encapsulates the client request.
>
> Agreed. We could add some non-normative text that implementations are expected to do something like find a random-access-point or I-frame or whatever the underlying format supports. We could also state that if the underlying format allows logical in- and out-points (like Ogg seems to do) these should be used to refine the interval.

Agreed.


>>
>>> 3. What about audio/video sync? If the user gets synced a/v we need to do recoding.
>>
>> Again, best effort and we need to find out when we have extreme un-sync that will not be tolerate by the user.
>
> Here we may need a bit more text. Again, for Ogg there is no issue (I assume) as it can use the logical in/out points to state where playback should start and stop (and I assume this can cater for A/V resyncing). For container formats that do not provide this functionality the client may have to do transcoding.

For any video format that has random seekability, you will not run
into a/v sync issues, because the encapsulation format supports
synching. MPEG is much better in this respect than Ogg, for example,
and Ogg still manages to keep sync in a chopped file. The whole
purpose of encapsulation formats is to provide sync. Therefore, I
don't actually think this will be a widespread problem.

As for chopping - this has to be done, of course, on all tracks
sensibly. So, if for example going back to a iFrame and cutting at
exactly the iFrame will lead to audio packets being chopped off that
are required for display of the requested time interval, then this is
not the cut point that is required, but the earlier one that includes
all the audio packets should be chosen.


>>> 4. What about timestamps in the media? Are these the originals?
>>
>> I haven't understood this question if it is not linked to the in-context/out-context discussion.
>
> Let me give an example. I download <http://www.example.com/example.mp4#t=100,200> to a local file "myclip.mp4". Next, I mail off myclip.mp4 to you. You open it in a video editor. Assuming this video editor will show timestamps as they appear in the original media (as opposed to starting at "0" for every file), there are 3 possibilities:
> a. We standardise that you will always see 100 as the timestamp of the first frame (or a number slightly lower, because of (1) above).
> b. We standardise that you will always see 0.
> c. We specifically state that this is up to the implementation.
>
> I think the issue is important, because one the one hand (a) allows for recombining fragments and comparing fragment time ranges, but on the other hand (c) allows simpler implementations. (I see no reason to enforce (b), I'm listing it for completeness).

I think that this has been discussed before under a different
topc/question. The question was what time should be displayed as the
start of a chopped file. And the answer was that that is up to the
implementation as it makes sense in the application.


Cheers,
Silvia.
Received on Wednesday, 1 April 2009 22:01:51 UTC