“CSS Transform Spec Fixes” Explainer

Motivating problems

There are two significant problems with the CSS 3D transform spec:

Lack of interop between browsers regarding 3D sorting and backface-visibility.
Both the TR and ED versions of the spec have definitions which are either ill-defined or break an unacceptable amount of web content if implemented.

Adjustments are proposed to resolve both of these issues, in a way that should be web-compatible enough for all browsers to adopt.

List of issues addressed

The specific issues are:

[ED spec issue] ED made overflow other than visible a grouping property. This severely breaks backward compatibility, because it forces all overflow to create a stacking context.

[TR/ED spec issue, browser compat issue] 3D rendering contexts are ill-defined for certain DOM trees where containing block and stacking context disagree. Even in cases that are well-defined, the definition in TR and ED differ, and the actual 3D sorting behavior is inconsistent across vendors.

[TR/ED spec issue] Elements with default style have the side effect of flattening, but is implemented as no-op in fact.

[Implementation difficulty] Does not distinguish the case that sibling stacking contexts on the same plane, and the case that different planes happen to be numerically coplanar. This is problematic because coplanarity is computationally intractable.

[browser compat issue] backface-visibility defined in the specs does not reflect actual browser behavior.

Issue 1: Overflow as Grouping Property

According to the ED, overflow:hidden is a grouping property; grouping property implies transform-style:flat; transform-style:flat implies stacking context. It has long been established that overflow other than visible does not necessarily create stacking context. It is going to break things. Lots of things.

Proposed Solution

Remove that clause from ED. This will have complication on composited scrolling, because descendants with parallax will need separate texture cache (which they already do in WebKit and Blink). On the other hand, this re-enables parallax, which is a frequently requested feature from web developers.

Alternative

Currently Blink (TODO: check Mozilla, Edge and WebKit) flattens matrix, but does not force stacking context nor creation of 3D context for overflow other than visible. This behavior can be accommodated by modifying the definition of child matrix defined below.

elem.child_matrix =

(style.overflow != “visible” ||

(style.zIndex != “auto” && style.transformStyle == “flat”) ?

Flatten(elem.screen_matrix) : elem.screen_matrix) *

perspective_matrix * Translate(-elem.scroll_offset)

Discussion from 2017/11/07

Parallax was not the reason why clipping needs to force flattening. It was mostly due to the concern about the technical difficulties to apply clip in a non-local space. In order to apply clip in local space (so that the clip rect is parallel to backing pixels), the implementations create a separate buffer in the local space, and the said buffer induced flattening as a side effect.

The direction we are heading towards is that:

overflow other than visible would force the used value of transform-style to flat.
transform-style:flat should not force stacking context.
Even if the computed value of transform-style was preserve-3d, stacking context is not forced if the used value is overridden to flat.
overflow other than visible with a stacking context follows the normal 3D context rule.

overflow other than visible without a stacking context will be treated as a special case, such that descendant 2D stacking contexts (those don’t create a plane) are passed to the parent stacking context for sorting, as defined by CSS 2.1, while descendant 3D stacking contexts (i.e. planes propagated from descendants) (even for those escaped the clip?) are depth-sorted and flattened immediately.
We didn’t reach a conclusion about the sorting order of the flattened result from the above special case. I personally recommend that it is treated as if it is the first child of the non-stacking context overflow-clipping element, with z-index:0.

Issue 2: 3D Rendering Context Penetration

A 3D rendering context is the 3D counterpart of a stacking context. A stacking context forms an isolated group (i.e. paints contiguously) and sort its child stacking context by z-index as defined by CSS; a 3D context also forms an isolated group but draws its participating planes in a 3D space instead, and its isolated result is a projection of said 3D space. This projection is also known as flattening.

transform-style is the CSS property that dictates how 3D contexts are structured, and has slightly different semantics in TR and ED.

The TR and ED specs differ in that ED requires certain styles to be considered flattening. This creates a dilemma when a stacking context ancestor that is not a containing block sits in the ancestor chain. Consider the following example:

</div>

What should be the stacking order between A, B, and C? B and C are in one stacking context (induced by the “isolation:isolate” property), and A is in another.

According to the TR spec,the 3D context lookup of an element follows the containing block chain (which is not necessarily a subset or superset of stacking context chain), stopping at the last element which has a preserve-3d used value of transform-style. Therefore, according to that spec, A, B and C are all in the same 3D rendering context (because the containing block of A, B and C is the element with id “root” above).

Thus the elements should stack in the order of BAC by depth sorting. However, this contradicts with the very definition of an isolated group under isolation: isolate.

According to the ED spec, the 3D context is defined by the nearest containing DOM ancestor of an element with a flat used value of transform-style. In addition, certain other styles force grouping, which means they also force the used value of transform-style to flat. Since isolation (and all other stacking context-inducing properties) are grouping properties, the element with id “isolate” above induces a 3D rendering context for B and C, and A lives in its own independent 3D rendering context.

However, this leaves the accumulated to-screen matrix of B and C ill-defined, because the ED spec also requires computing the accumulated 3D transform matrix by multiplying ancestor matrices along the containing block chain. This leaves the matrices for B and C ill-defined, because the condition in step 4 of the algorithm (stop at the 3D rendering context root) never occurs, as “isolate” is not in the containing block chain of B or C.

Furthermore, the changed definition of 3D context in ED is not backward compatible because depth sorting should not be used when none of the elements has transform-style:preserve-3d. For example:

</div>

A backward-compatible definition requires stacking order AB, but the ED specs dictates BA.

At the time of writing, Chrome 63, Edge 41, and Firefox 57 follow TR, while Safari 42 follows ED (roughly, where it applies).

Proposed Solution

It is inherently wrong to make transform-style a property associated with containing blocks. Depth sorting is really a stacking context operation, as it controls how an isolated group composites its descendant isolated groups.

On the other hand, the screen matrix of an element is a property associated with containing blocks. Fortunately, the screen matrix somewhat agree across all vendor implementations. In this proposal we’ll also redefine its computation so it is well-defined in all cases while remain backward compatible.

We propose the following adjustments to fix these issue:

transform-style has two possible computed values: flat (default) and preserve-3d, as defined in TR. (We remove auto, which was added in the ED spec.)
An element that has any grouping property forces the used value of transform-style to flat. (Unchanged from what is in the ED spec.)
If the used value of transform-style is preserve-3d and the computed value of z-index is auto, adjust the used value of z-index to 0. (Unchanged from what is in the ED spec.)
If the used value of z-index is auto, the used value of transform-style:flat is ignored. I.e. the property only affects a stacking context, and has no effect on non-stacking context elements.
A stacking context with transform-style:flat stacks a like traditional stacking context. I.e. Child stacking contexts are sorted by their z-index. The depth component of child stacking context is ignored.
A stacking context with transform-style:preserve-3d behaves differently depends on its parent stacking context’s transform-style. If the parent stacking context has transform-style:flat, it sorts all planes propagated from its child stacking contexts. Otherwise its parent stacking context has transform-style:preserve-3d, it forwards all planes propagated from its child stacking context to it parent without sorting. (Similar to TR, but uses stacking context chain instead of containing block chain.)
Redefine accumulated 3D transformation matrix by the screen matrix, instead of the local matrix from 3D context root to the current element. This avoids the dilemma “3D context root is not part of containing block chain” a.k.a. 3D context penetration. The screen matrix is defined by induction as follows:

The screen matrix and child matrix (which is going to be inherited by children) of the initial containing block, i.e. the viewport, is an agent-defined value, which may include any transformation imposed by window decoration or native system.
For an element elem, the screen matrix and child matrix is defined by:

parent = ContainingBlock(elem)

style = UsedStyle(elem)

local_matrix = Translate(elem.transform_origin) *

elem.transform * Translate(-elem.transform_origin)

elem.screen_matrix = parent.child_matrix *

Translate(elem.layout_offset) * local_matrix

perspective_matrix = Translate(elem.perspective_origin) *

Perspective(elem.perspective) *

Translate(-elem.perspective_origin)

elem.child_matrix =

(style.zIndex != “auto” && style.transformStyle == “flat” ?

Flatten(elem.screen_matrix) : elem.screen_matrix) *

perspective_matrix * Translate(-elem.scroll_offset)

See appendix of this document for examples.

Note: The Flatten function resets the 3rd row and the 3rd column of a 4x4 matrix.

Discussion from 2017/11/07

We prefer that 3D children of a transform-style:flat stacking context to be sorted by depth, which is the behavior exhibited by WebKit but not the rest of the implementations. The current ED is an (imperfect) attempt to describe that intended behavior. We’ll try to repeal the parts in ED that introduced unintended consequence, but keep the good part that allowed transform-style:flat stacking context to depth-sort its 3D children.

transform-style has two possible computed values: flat (default) and preserve-3d, as defined in TR. (We remove auto, which was added in the ED spec.)
An element that has any grouping property forces the used value of transform-style to flat. (Unchanged from what is in the ED spec.)
If the used value of transform-style is preserve-3d and the computed value of z-index is auto, adjust the used value of z-index to 0. (Unchanged from what is in the ED spec.)
If the used value of z-index is auto, the used value of transform-style:flat is ignored, except for the special case of overflow clip described above. I.e. the property only affects a stacking context, and has no effect on non-stacking context elements.
Plane creation: transform property that contains 3D function, animated transform property that contains 3D function in either interpolation point, or will-change:transform will pull the subtree to its own plane. Otherwise the stacking context paints into the same plane as its parent stacking context.
A stacking context with transform-style:preserve-3d behaves like traditional CSS 2.1 stacking context for its 2D stacking context children (those who don’t create a plane), planes from descendants are propagated to parent stacking context.
A stacking context with transform-style:flat paints its normal-flow background phase at bottommost. A 3D context is created to flatten planes propagated from descendants, and itself also creates a default plane (z=0 in local coordinates) which its 2D negative-z-index children, normal-flow foreground, and 2D positive z-index children are painted into.

Issue 3: Default-styled Elements should be no-op

Consider the following example:

<div>

</div>

It is intuitive that default-styled elements should be no-op, and that is what’s been implemented by vendors. ED attempted to solve the problem by introducing transform-style:auto, which is not implemented by any vendor, and created more backward-compatibility issues. For example we’ve seen real-world pages that uses transform-style:flat in their stylesheet, which was meant to undo transform-style:preserve-3d added by other style rules, but would introduce the side effect of forcing stacking context under the ED spec.

Proposed Solution

The solution from Issue 2 automatically solve it because non-stacking context elements won’t interfere with stacking decisions.

Discussion from 2017/11/07

Mozilla seemed to follow the specs in verbatim such that default-styled elements would respect default transform-style:flat. We should revise the spec so that transform-style:flat is only effective on elements that has stacking context or overflow clip.

Issue 4: Coplanarity is Computationally Intractable

Both TR and ED required that planes that are co-planar shall stack in z-index order. It is impossible to implement because many functions available to transform yields irrational numbers that are not representable in floating point numbers, and is subject to rounding errors. Symbolic implementation is possible given the current function set, but is computationally expensive. Current implementation uses arbitrary threshold for coplanarity, but this practice creates more problems: 1. There is still a dilemma at the threshold. 2. The threshold is not documented nor standardized, and the way the threshold is defined depends on internal details.

Even if coplanarity can be computed cheaply, the tie breaking rule still causes performance problem. Consider the following example:

</div>

Element A and C won’t be able to raster into the same texture cache, because transform animation on B may result in coplanarity with plane AC.

Proposed Solution

Coplanar planes will stack in unspecified order. Web developers are strongly discouraged from creating overlapping coplanar contents.

To maintain backward-compatibility, the criteria for creating planes should be defined as well so traditional stacking contexts still stack in expected order than unspecified.

A default plane is created for stacking context that sorts planes propagated from its children. I.e. An element with transform-style:preserve-3d while its parent stacking context has transform-style:flat. (See proposed solution for issue 2.)
Any 3D transform function used in transform property creates a plane. Even if the computed matrix is flat. e.g. translateZ(0) and rotateX(30deg)rotateX(-30deg) create planes, but rotateZ(10deg) doesn’t.
will-change:transform creates a plane.
Key frame animation or transition creates a plane if any interpolation point used a 3D transform function, regardless of computed matrix.

Note: Historically transform:translateZ(0) has been used by web developers as raster caching hints. This proposal would make transform:translateZ(0) to make a plane and have implication on stacking order. This is okay when the parent stacking context has transform-style:flat, since it will be sorted solely by z-index anyway. If the parent stacking context has transform-style:preserve-3d, this plane’s stacking order against other coplanar sibling planes will become unspecified. We may need to make translateZ(0) a 2D transform function as a special case if it causes backward-compatibility issue in real world.

Discussion from 2017/11/07

Iterate the specs by deprecating the z-index tie-breaking rule in the specs, with a warning that implementation may or may not implement the tie-breaking rule. Also add examples that will result in unspecified behavior under the proposed spec change, and how web developers can restructure their pages to retain defined sorting order by avoiding plane creation.

Comment by trchen@, not discussed yet: Due to the conclusion from above, we still want transform-style:flat stacking contexts to sort its 3D children. This would be a problem because transform:translateZ(0) would have unspecified painting order. We will have to make translateZ(0) a special case.

Issue 5: backface-visibility should reflect real-world behavior

In both TR and ED, backface-visibility is defined as per-element effect. i.e. If the screen matrix of an element is back-facing and has backface-visibility:hidden, its painting is skipped as if it has visibility:hidden. This is not the behavior exhibited by WebKit, whose behavior treated as the de-facto standard.

In WebKit, backface-visibility:hidden will cause an element to create a self-painting layer, which is the internal representation of (pseudo) stacking contexts. (Note: Pseudo stacking context are elements that are positioned or floated but with z-index:auto. They paint like a stacking context, while cannot have child stacking contexts.) Normal-flow contents of the (pseudo) stacking context ignore their own backface-visibility value, but instead inherits backface-visibility from the element that created the (pseudo) stacking context.

Also historically backface-visiblity:hidden has been used by web developers as raster caching hints, but without creating a stacking context.

Proposed Solution

The specs should reflect real-world behavior.

Used value of backface-visibility:hidden makes the element a pseudo stacking context (as if having position:relative) if not a stacking context already.
A (pseudo) stacking context with backface-visibility:hidden determines the visibility of its background phase and normal-flow phase by computing the face-orientation of its screen matrix. Painting of the phases is skipped if it is back-facing.

This does still not completely reflect WebKit’s legacy behavior, but is much closer, and does not depend on WebKit’s internal implementation details.

Discussion from 2017/11/07

Didn’t have time. :(

Appendix

Examples of 3D Stacking and Screen Matrix Computation

Example 1, transformed flat parent with transformed child.

Test body:

<style>

div {

width: 100px;

height: 100px;

}

</style>

</div>

Test body 2 (which should behave the same as above):

<style>

div {

width: 100px;

height: 100px;

}

</style>

</div>

Expectation:

Explanation:

The purpose of this test is to verify a containing block that is also a flat stacking context would flatten its child matrix, i.e. removing the z-component of its screen matrix for its children to inherit. The second test body is to verify the value of transform-style only affects the child matrix of an element, but doesn’t affect its own screen matrix in any way.

a.screen_matrix = translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

a.child_matrix = flatten(

translate2d(-50, -50)rotateY(45deg)translate2d(50, 50))

= translate2d(-50, -50)scaleX(sqrt(2)/2)translate2d(50, 50)

b.screen_matrix = flatten(

translate2d(-50, -50)rotateY(45deg)translate2d(50, 50))

translate3d(-50, -50)rotateY(45deg)translate2d(50, 50)

= translate2d(-50, -50)scaleX(sqrt(2)/2)

rotateY(45deg)translate2d(50,50)

Example 2, transformed flat parent with transformed children

Test body:

<style>

div {

width: 100px;

height: 100px;

}

</style>

</div>

Test body 2 (which should behave the same as above):

<style>

div {

width: 100px;

height: 100px;

}

</style>

</div>

Expectation:

Explanation:

Similar to example 1, but the child element now has a sibling that has an intersecting screen matrix. The purpose of this test is to verify that a flat stacking context always stack its children by z-index, ignoring the z-depth component of the matrix, regardless transform-style of the children.

Example 3, 3D context penetration

Test body:

<style>

div {

width: 100px;

height: 100px;

}

</style>

</div>

Expectation:

<style>

div {

width: 100px;

height: 100px;

}

</style>

Explanation:

This is a tricky one. Element C and D both create their own 3D context, because their parent stacking context B is flat. Both of 3D context consists only one plane thus their flattened result is trivial. Then both results get sorted by their parent stacking context B. D stacks on top of C because stacking context B is flat. Then stacking result of B composites on top of its parent plane A with 50% opacity.

The interesting thing is that although C and D are flattened and stacked in flat stacking context B, their screen matrix inherit from A, thus are not flattened. Therefore element C has an accumulated matrix of identity thus rendered in full size, while element D has an accumulated matrix of rotateY(90deg), which is degenerate.

a.screen_matrix = translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

a.child_matrix = translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

b.screen_matrix = translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

b.child_matrix =

flatten(translate2d(-50, -50)rotateY(45deg)translate2d(50, 50))

c.screen_matrix = translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

translate2d(-50, -50)rotateY(-45deg)translate2d(50, 50)

= identity

d.screen_matrix = translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

translate2d(-50, -50)rotateY(45deg)translate2d(50, 50)

= translate2d(-50, -50)rotateY(90deg)translate2d(50, 50)