Re: Thoughts on HDR

Hi Craig,

Some comments on your proposal.

Consider the bit rate:
Current 16-bit floating point is not an efficient encoding for presenting HDR video on a computer display. Too many bits, a wasted sign bit, and overkill for what’s needed. (Would need 24 Gbps for 4k 60 fps, or 96 Gbps for 8K 120 fps)
To achieve the pixel rate needed for 4k 60 fps and beyond we need a more efficient bit encoding. Both 10bit PQ and 10bit HLG are better options, and AFAIK 10 bits are sufficient for most HDR display cases. (15 Gbps for 4k 60fps, 60 Gbps for 8k 120 fps, the latter still too much for HDMI)
If the interface is 10bit then using floats in the computer’s display buffer would be inefficient and a wasted conversion.

RGB is also inefficient. YCC 422 subsampling would reduce the bit rate by 1/3. However, YCC causes a data loss as 5 out of 6 RGB values cannot be restored exactly from YCC. YCC is standard encoding for video and photography, so no visible loss there, but can create aliasing for high-resolution text and graphics outside of video.
So how about using RGB 10 for the interface?

Where to convert:
I would prefer having the conversion happening inside the display, not on the computer.
This lets me have multiple displays of different ranges on the same output (say phone and projector), and my application can be dumb.
This can be beneficial as there are many more app vendors than display vendors.
However, the computer still must compose the monitor frame into a single color space. If so, this single color space should be an HDR video space such as BT.2100. Choosing an HDR video space means that I can avoid the resource-intense on-computer conversion of the above mentioned 8K 120 fps video to another space.

On latency:
Static tone mapping does not add any latency as it’s just doing a LUT conversion in parallel with transmission.
Dynamic tone mapping would be more challenging on both host and display if it would require frame analysis before frame conversion.

On tone mapping:
What to do if the computer runs multiple apps so the display shows multiple windows each with a video in a different encoding?
Maybe each window needs to be tone-mapped individually and dynamically?
Yikes, that would be complicated, and cannot be done inside the display as window boundaries are lost.
Maybe the computer would tone map into its generic display buffer and then each display tone maps from there?

As usual no easy solutions

Lars

From: Craig Todd <CT@dolby.com>
Date: Sunday, January 10, 2021 at 12:33 PM
To: "public-colorweb@w3.org" <public-colorweb@w3.org>
Subject: Thoughts on HDR
Resent-From: <public-colorweb@w3.org>
Resent-Date: Sunday, January 10, 2021 at 12:33 PM

I'd like to share some thoughts to the discussion of HDR. I'm very knowledgeable about HDR in the entertainment space but I have little to no understanding about the IT space. Nevertheless as I listened to the discussion during the Dec. '19 meeting some thoughts came to mind. I've corresponded a bit, and then chatted, with Pierre and he suggested I share my thoughts with the group.

Background
In the entertainment space, mixing of SDR and HDR content for presentation on a screen is not a major topic. It does come up re picture-in-picture (PIP) and overlay of graphics onto HDR where the (legacy) graphics generation may be been done in SDR. Typically the source device maps the SDR (gamma) into HDR (PQ or HLG) for interface to an HDR display.
For entertainment content the concept and use of a reference display and a reference viewing environment (dimly lit) is very important so that content produced is consistent. Use of a calibrated display in a similarly dim environment enables a viewer to see the image as intended. If the viewing environment differs (e.g. bright room) then the display should ideally compensate by altering the electro optical transfer function (EOTF) so that what is viewed and perceived is still (to the extent possible) faithful to what was produced.
Display of SDR (gamma) and PQ are defined to be display referred, i.e. the pixel value implies a specific luminance. This was not initially specified for SDR, as defined in Recommendations ITU-R BT.601 (SDTV) and BT.709 (HDTV), but was de-facto enforced by all displays being CRT and CRTs being effectively limited to about 100 nits luminance. Reference viewing rooms had been specified to use a background at 10% of the luminance of reference white on the display. When flat panels arrived they could go brighter and it became necessary for the ITU-R to formally specify the display transfer function (the gamma 2.4 EOTF) and the reference luminance (100 nits) which was done in BT.1886. BT.2035 specifed the HDTV SDR viewing environment to employ a 10 nit background (10% of 100 nit reference white) in BT.2035. HDR was documented in BT.2100. This document is comprehensive in that it clearly specifies reference OETF (capture), EOTF (display) and end-to-end (OOTF) transfer functions for both PQ and HLG. Also specified is a reference viewing environment (5 nits). A 16-bit floating point representation is also specified for both scene, or display luminance.
A reference SDR display is suitable for use by consumers in a darkish room. But typically TVs used in bright rooms display at 3x or so brightness, i.e. 300 nits. Computers also typically are used in bright rooms and display sRGB at several hundred nits. A computer+display used to view entertainment content in a critical viewing environment should be capable of matching the ITU specs. I personally prefer creation of images per the standards, and then altering the actual display based on the actual viewing environment. This alteration could be done in the computer rendering or in the display itself; I think doing this in the display is preferable.
HLG HDR display is a bit different from gamma SDR and PQ HDR. The reference display function (EOTF) is defined to simply map the full range of the signal to the luminance capability of the display; a 1000 nit display is a nominal reference for use in production. So a 500 nit HLG display will show a dimmer picture, and a 2000 nit HLG display a brighter picture. There is a gamma tweak in the EOTF that makes these variations in display luminance more acceptable. The simplicity in this is that no processing is needed to match an HLG image to a display.
The very high dynamic range of PQ (0-10k nits) can result in signals that can exceed the capabilities of any particular HDR display. What is done in a display is to accurately present the pixel luminance for pixels within the display's capabilities, and to limit (tone map or soft clip) those pixels representing highlights that exceed the display capability. So the take away should be that PQ displays accurately reproduce mid-tones (the 20 nit faces, and 200 nit diffuse whites) but crush highlights, and that HLG displays alter both mid-tones and highlights to map the full range of the signal to the full range of the display.

Thoughts for colorweb
A concept in my mind is that the computer equipment could support HDR by exploiting current standards and create an idealized image meant for display in the reference dim environment (as currently used for creation of entertainment content). This image would be provided to the display which would be charged with altering the delivered image to compensate for the display's own limitations (peak luminance, poor black rendition, limited color gamut, power limiting), the viewing environment, and user preference. Of course the processing needed to alter the ideal image to what the display should show could be done either in the display, in the computer, or split between them. I'm presuming use of RGB with primaries as defined in BT.2020/BT.2100. Those primaries cannot represent all visible colors, but as they are standardized for UHDTV, they are widely supported by displays as input signals (even though the displays can't actually display all the colors). If the display is ignorant of and unable to accept HDR, then an image that is created for HDR would have to be mapped down to SDR by the computer source device. HDR to SDR conversion is a big topic and there are no accepted standards or recommendations for it.
The image could be created in the BT.2100 16-bit floating point format where 1.0 indicates 1 nit on the display. Or the idealized image could be created directly in PQ (10k nit limit) or HLG (1k nit limit).  For a float image, the interface to the display could be 16-bit float, but I don't think there are currently standards to specify FP over DisplayPort or HDMI. Floats could be directly mapped to PQ or HLG which are supported by interfaces. For PQ interface, floating point values >10k would need to be pre-limited. For HLG interface, FP values >1k nits would need to be limited.
It would be straightforward to map SDR, PQ, or HLG onto a FP image. While SDR should ideally map to a 100 nit float value, there are industry practices to map SDR at 2x luminance to better match HDR brightness (this is documented in a MovieLabs recommendation and described in ITU-R Reports). So while PQ and HLG pixels could directly map to a FP value, SDR pixels could map directly, with a 2x (or other value) gain, or systems could employ sophisticated dynamic up-mapping algorithms (suitability of which might be content dependent, i.e. SDR graphics vs SDR video). For interface other than FP, another mapping of FP to either PQ or HLG would be needed, with clipping or soft-limiting in the case of high luminance pixels that exceed the capability of the interface format.
There can be latency introduced by the display needing to map the incoming HDR signal to match its own capabilities. Some entertainment/gaming source devices query a consumer display to determine the displays capabilities and then perform the display mapping inside the source device. Especially in gaming the reduction in latency is useful. So if the computer system understands the display capability the mapping could be done as the image to be delivered is generated, so instead of creating an idealized image an actual display image would be generated. This would simplify the display but complicate the image generation.
I plan to be on the call Monday night (PST time zone).
Craig Todd, Dolby Fellow

Received on Monday, 11 January 2021 23:35:22 UTC