ISSUE-179 (px measure): Interpreting the pixel measure [DFXP 1.0] from Timed Text Working Group Issue Tracker on 2012-08-24 (public-tt@w3.org from August 2012)

From: Timed Text Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Fri, 24 Aug 2012 11:18:17 +0000
To: public-tt@w3.org
Message-Id: <E1T4ru5-0001Qy-2L@tibor.w3.org>

ISSUE-179 (px measure): Interpreting the pixel measure [DFXP 1.0]

http://www.w3.org/AudioVideo/TT/tracker/issues/179

Raised by: Sean Hayes
On product: DFXP 1.0

We have defined the pixel measure with reference to XSL, namely:

The actual distance covered by the largest integer number of device dots (the size of a device dot is measured as the distance between dot centers) that spans a distance less-than-or-equal-to the distance specified by the arc-span rule in http://www.w3.org/TR/REC-CSS2//syndata.html#x39 or superceding errata.

or

a fixed conversion factor, treating 'px' as an absolute unit of measurement (such as 1/92" or 1/72").

This is leading to problems in practice as we use TTML with actual delivered video. I recall that during the development of the specification we discussed the concept that px would equate to video pixels, so that authors could align elements precisely with elements in the video however this seems to have been lost.

I suggest that we need to clarify the pixel behaviour; in particular we should explain that where extent is used on the tt element this effectively *defines* the size of a pixel in the sense that the root area extent is still mapped to a video overlay and divided into 'logical pixels'. If such an extent is not defined on the tt element, then the px measure should probably not be used (or even expressly forbidden)

Example, lets say that the author has a video nominally[1] 1024x768 in extent. This is being displayed however fullscreen on a monitor 1920x2000 (with 280px of black above and below the video). If we use device dots as XSL suggests, the captions will not be aligned correctly with the video.

If the px unit is used on the extent attribute on the tt element extent="1024px 768px" I believe the expectation was that the root element is scaled to 1920x1440 along with the video and placed in correspondence with the video, and so the px metric actually means 1.875 device dots.

[1] Note also that the pixel extent is not the actual delivered pixel density of the video either, since in an adaptive streaming model the actual frame size may vary depending on bandwidth. It needs to be an authoring concept based on the original coding size of the video.

Received on Friday, 24 August 2012 11:18:18 UTC