Revisiting Bug 22148 - adding jitter to video quality metrics from David Singer on 2013-08-30 (public-html-media@w3.org from August 2013)

From: David Singer <singer@apple.com>
Date: Fri, 30 Aug 2013 16:02:23 -0700
To: public-html-media@w3.org
Message-id: <5B952524-E899-4FE4-B7F0-81517793B3B8@apple.com>

<https://www.w3.org/Bugs/Public/show_bug.cgi?id=22148>

We are concerned about the new definition of the displayed frame delay, and the use of this value to accumulate a jitter value in totalFrameDelay.


> Displayed Frame Delay
> The delay, to the nearest microsecond, between a frame's presentation time and the actual time it was displayed. This delay is always greater than or equal to zero since frames must never be displayed before their presentation time. Non-zero delays are a sign of playback jitter and possible loss of A/V sync.
> 
and
> totalFrameDelay
> The sum of all displayed frame delays for all displayed frames. (i.e., Frames included in the totalVideoFrames count, but not in the droppedVideoFrames count.
> 

[by the way, editors, you have a missing ")" there]

Here are our concerns:

1.  The use of microseconds may be misleading.  There is an implied precision here which is rarely (if ever) achievable; by no means everyone can time 'to the nearest microsecond' and sometimes the measurement has to be done 'before the photons emerge from the display', at a point in the pipeline where the rest of it is not completely jitter-free.

2.  In any case, frames are actually displayed at the refresh times of the display;  display times are actually quantized to the nearest refresh time.  So, if I was slightly late in pushing a frame down the display pipeline, but it hit the same refresh as if I had been on time, there is no perceptible effect at all.

3.  Thus, ideally, we'd ask for the measurement system to be aware of which display refresh the frame hit, and all results would be quantized to the refresh rate. However, in some (many?) circumstances, though the average or expected pipeline delay is known or can be estimated, the provision of frames for display is not tightly linked to the display refresh, i.e. at the place of measurement, we don't know when the refreshes happen.

4.  There is a big difference in jitter between presenting 2000 frames all 5ms late (consistently), and in presenting 50 of them 200ms late and the rest on time, though for both we'd report 10,000ms totalFrameDelay. The 5ms late may not matter at all (see above), whereas 200ms is noticeable (lipsync will probably be perceptibly off).  There is nothing in the accumulation of values, today, that takes into account *variation*, which is really the heart of what jitter is about.

I don't have a proposal right now for something better, but felt it was worth surfacing these concerns.  Do others have similar, or other, concerns, about these measurements?  Or indeed, suggestions for something that might alleviate these or other concerns (and hence, be better)?

I guess a big question is:  what are the expected *uses* of these two values?





David Singer
Multimedia and Software Standards, Apple Inc.

Received on Friday, 30 August 2013 23:03:02 UTC