Re: [css2-transforms] Re: comments on Matrix from Benoit Jacob on 2013-03-20 (public-fx@w3.org from January to March 2013)

From: Benoit Jacob <jacob.benoit.1@gmail.com>
Date: Wed, 20 Mar 2013 15:10:31 -0400
To: Rik Cabanier <cabanier@gmail.com>
Cc: Dirk Schulze <dschulze@adobe.com>, "public-fx@w3.org" <public-fx@w3.org>
Message-ID: <CAJTmd9r0Mv2oH=6ptgsaDDSWvWcBmbDMFvGckMvz8qwtFss2LA@mail.gmail.com>
2013/3/20 Rik Cabanier <cabanier@gmail.com>

>
>
> On Wed, Mar 20, 2013 at 7:29 AM, Benoit Jacob <jacob.benoit.1@gmail.com>wrote:
>
>>
>>
>> 2013/3/20 Dirk Schulze <dschulze@adobe.com>
>>
>>> It is easier to answer to the answers entirely, even if we can discuss
>>> details in separate threats later.
>>>
>>> The specification describes a unified way to exchange matrices across
>>> other specifications. This is a very reasonable approach for me.
>>
>>
>> It's reasonable to exchange matrices across APIs but we don't need a
>> Matrix class for that, we can exchange raw arrays (say Typed Arrays) as is
>> done in WebGL. We'd just need to agree once and for all on a storage order
>> (say column-major as in WebGL).
>>
>> If we do add a Matrix interface for the purpose of exchanging data, then
>> at least it does not need to offer any computational features.
>>
>>
>>> We already have matrix definitions in SVG (SVGMatrix). And even if
>>> SVGMatrix is less specified than with this specifications, we have a huge
>>> amount of compatible implementations, including all major browsers and even
>>> more SVG viewers. I am much less concerned about the specification than you
>>> are. In fact, there is a need for an exchange format of transformation
>>> descriptions. Currently, HTML Canvas relies on SVGMatrix to describe a CTM.
>>>
>>
>>> The primary goal of the specification is interoperability and backwards
>>> compatibility. As mentioned before, SVG described SVGMatrix. This
>>> specification replaces SVGMatrix with the requirement to be as much
>>> backwards compatible as possible. This requires to follow the naming schema
>>> chosen in the specification.
>>
>>
>> That SVG has a SVGMatrix doesn't imply that other Web APIs should have a
>> matrix class. Maybe SVG had a good reason to have a matrix interface, which
>> I don't know, but I don't understand how that would generalize enough to
>> have a Web-wide matrix interface,
>>
>
> No, CSS, Canvas and SVG currently all have different matrices. People
> shouldn't have to learn 3 different ways to do the same thing.
> A matrix class is needed to bridge the gaps.
>

In my ideal world these Web APIs wouldn't impose their own matrices, they
would just take raw arrays. The user would be free to use whatever JS
matrix library he wants. The user would get more flexibility, no mismatched
matrix APIs, and better performance too going forward.

In the non-ideal world where matrix APIs have already been allowed to sneak
into Web APIs, first of all I don't really see how adding yet another
matrix API fixes anything, since we can't really take anything out from
existing Web APIs (we don't break Web compatibility) so we're stuck with
any matrix API we have for a long time anyway. If really we agree to try to
at least standardize on one matrix interface going forward then I'm at
least arguing that it should focus solely on data exchange and NOT on
anything computational.


>
>
>> when, as I said above, arrays are enough to exchange matrices, and even
>> if we really wanted a matrix interface for data exchange, that still
>> wouldn't justify putting computational features in it.
>>
>
> There is a need for those transformations. Are you suggesting that
> everyone should roll their own or rely on JaveScript libraries?
>

Yes, exactly that.

Two years ago it made sense to want to have browser-level support for that.
Now that JS performance has improved much, and has a clear path to close in
on native performance (see asm.js within a factor of two of native code
here: https://bugzilla.mozilla.org/show_bug.cgi?id=840282#c0 ) there is
little hope that browser-supported matrices could be as fast as JS
matrices. Again, (as I explained above) JS matrices are at a large
theoretical performance advantage thanks to letting the JS compiler
optimize everything. Keep in mind that matrix libraries is _the_ area where
any matrix library that wants to achieve high performance on nested
arithmetic expressions (even just a+b+c for three vectors) has to behave
like a compiler, which is why matrix libraries are about the only area
where expression templates are widely used in c++.


>
>
>>
>>
>>> The interoperability is not limited to SVGMatrix. There is a strong
>>> relation to CSS Transforms. The decomposing code is entirely based on the
>>> matrix decomposing for CSS Transforms.
>>
>>
>> It's good to know where bad things (this matrix decomposition) come from,
>> but that doesn't provide justification to spread them even wider!
>>
>
> The reason we use this particular decomposition is that Apple uses it in
> their Core Animation framework which is used to do animated transforms.
> If a user wants to simulate a transition or animation of a transform, he
> needs to use the exact same algorithm that the browser used.
>

If Apple found that that decomposition helped them achieve whatever
particular animations they were interested in doing on iOS or Mac OS ---
good for them. There is no way that I could be convinced that this
decomposition is the right one for general enough animations i.e. with
arbitrary rotations and scalings along arbitrary (not just X, Y, Z) axes.
That's because interpolating geometric transformations in a sane way should
give you the same result no matter what basis (what coordinate system) is
used, and the present decomposition is not covariant as I tried to explain
in my previous email. If you want to interpolate arbitrary linear (i.e. no
translation -- this is easy to interpolate separately) transforms in a sane
way, first do a polar decomposition of the two endpoint transforms,

    transform = rotation1 * scaling

where scaling is an arbitrary symmetric matrix; then diagonalize the
scaling into

    scaling = rotation2 * diagonal * rotation2^T

then interpolate the rotation1's, the diagonal coefficients, and the
rotation2's separately. Note that these two decompositions can be achieved
with a single SVD, but that viewpoint is probably better because if you
matrices are already rotations you definitely don't want to replace them by
a product of two rotations, as a direct SVD would do. That interpolation
will look very natural geometrically because you'll really be interpolating
singular values and left/right singular vectors i.e. things that have a
coordinate-free geometric significance. Using a different coordinate system
wouldn't make any difference.

If you're not convinced yet I'm OK to work out an example where QR-based
interpolation would give weird results for reasonable endpoints.

After writing that I googled "matrix interpolation" and found that,
http://stackoverflow.com/questions/3093455/3d-geometry-how-to-interpolate-a-matrix
which links to a paper titled Matrix Animation and Polar
Decomposition<https://docs.google.com/viewer?url=http://www.cs.wisc.edu/graphics/Courses/838-s2002/Papers/polar-decomp.pdf>which
seems to recommend roughly the same as discussed above although I
haven't read it.



>
>
>> Since, as you seem to concede below, there is no other documentation for
>> decompose() than looking at its pseudocode, let's do that.
>>
>
> The code comes from the book 'graphics gems' and is widely accepted as a
> reasonable way to decompose a matrix.
> You can see WebKit's implementation of pseudo code here:
> https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/graphics/transforms/TransformationMatrix.cpp
> The function starts on line 298
>

Neither "Apple does it" nor "it's in Graphics Gems" imply that that's the
right way to do it. Let's stay away from authority arguments.


>
>
>>
>> The key part is where the upper-left 3x3 block is decomposed into
>> "rotation", "scale" and "skew". I use double quotes here because the
>> geometric terms here, "rotation" and "scale", are clearly being abused, as
>> we're going to see:
>>
>> // Now get scale and shear. 'row' is a 3 element array of 3 component vectors
>> for (i = 0; i < 3; i++)
>>     row[i][0] = matrix[i][0]
>>     row[i][1] = matrix[i][1]
>>     row[i][2] = matrix[i][2]
>>
>> // Compute X scale factor and normalize first row.
>> scale[0] = length(row[0])
>> row[0] = normalize(row[0])
>>
>> // Compute XY shear factor and make 2nd row orthogonal to 1st.
>> skew[0] = dot(row[0], row[1])
>> row[1] = combine(row[1], row[0], 1.0, -skew[0])
>>
>> // Now, compute Y scale and normalize 2nd row.
>> scale[1] = length(row[1])
>> row[1] = normalize(row[1])
>> skew[0] /= scale[1];
>>
>> // Compute XZ and YZ shears, orthogonalize 3rd row
>> skew[1] = dot(row[0], row[2])
>> row[2] = combine(row[2], row[0], 1.0, -skew[1])
>> skew[2] = dot(row[1], row[2])
>> row[2] = combine(row[2], row[1], 1.0, -skew[2])
>>
>> // Next, get Z scale and normalize 3rd row.
>> scale[2] = length(row[2])
>> row[2] = normalize(row[2])
>> skew[1] /= scale[2]
>> skew[2] /= scale[2]
>>
>> // At this point, the matrix (in rows) is orthonormal.
>> // Check for a coordinate system flip.  If the determinant
>> // is -1, then negate the matrix and the scaling factors.
>> pdum3 = cross(row[1], row[2])
>> if (dot(row[0], pdum3) < 0)
>>     for (i = 0; i < 3; i++)
>>         scale[0] *= -1;
>>         row[i][0] *= -1
>>         row[i][1] *= -1
>>         row[i][2] *= -1
>>
>> So that's just plain old Gram-Schmidt orthonormalization. So what's being
>> called "rotation" here is just the orthonormalized rows of the original
>> matrix, and what's being called "scale" here is just the lengths of the
>> rows of the original matrix. In other words, this decomposition is a QR
>> decomposition.
>>
>> Sure enough, if the input matrix is of a very special form, like
>>
>>     DiagonalMatrix * Rotation
>>
>> Then this decomposition will recover the expected scaling and rotation.
>> But that's about it (it works in slightly more generality thanks to the
>> skew factors, but not much more generality).
>>
>> For an arbitrary matrix, this decomposition will return "scaling" and
>> "rotation" components that aren't what one would naturally expect.
>>
>
> The point is that it matches the internal logic.
>

No, it doesn't. The internal logic is a Gram-Schmidt orthonormalization
process. I've never thought of Gram-Schmidt as decomposing a matrix into
"rotation" "scaling" and "skew". Alternatively this is a QR decomposition
and sure enough, Q is a rotation but I've never seen it considered as "the
rotation part of the original matrix". It's rather, "the rotation you can
obtain by applying row operations on the original matrix". Likewise R is,
in my book, just the triangular matrix encoding this series of row
operations, I've never seen its diagonal coeffs seen as "the scaling part
of the matrix" and it's off-diagonal part as "the skew part". That "skew"
part doesn't make any geometric sense by the way, it's purely an artifact
of the coordinate system.


>
>
>>
>> Here's an example. Let's reason with 2D transforms for simplicity. Take
>> the following 2x2 matrix:
>>
>>   1  3
>>   3  1
>>
>> It's a non-uniform scaling along orthogonal axes: it scales by a factor
>> of 4 along the axis generated by the (1,1) vector, and it scales by a
>> factor of -2 along the axis generated by the (1,-1) vector.
>>
>
>> So if a decomposition method is going to claim to be able to recover
>> "scaling" and "rotation" and "skew" components from this, it should be able
>> to find scaling factors of (4, -2), the rotation by 45 degrees bringing the
>> above (1, 1), (1, -1) axes onto the X and Y axes, and no skew.
>>
>> But if you apply the spec's decompose() to this matrix, you won't find
>> that. The scaling factors, being computed as the lengths of the rows, will
>> be claimed to be equal to sqrt(1*1+3*3) == sqrt(10) and -sqrt(10). That's
>> not useful. That sqrt(10) number is not an interesting thing associated
>> with this geometric transformation, it's just incidental to how this
>> transformation is represented in this particular basis. Moreover, giving
>> both scaling factors as equal to sqrt(10) in absolute value misses the fact
>> that this is a non-uniform scaling (the geometric scaling factors here are
>> -2 and 4). The resulting "rotation" matrix will be equally useless from a
>> geometric perspective.
>>
>> I'm not saying that the computation performed by decompose() is generally
>> useless. It's Gram-Schmidt orthonormalization, and it's one of the most
>> basic, and most useful algorithms in linear algebra. But it's not giving
>> what is implied by the names of "rotation" and "scaling", and as a matrix
>> decomposition, it amounts to a QR decomposition, which is useful for linear
>> solving but is not revealing geometric components of the transformation at
>> hand here.
>>
>> What this should all have been, is a polar decomposition
>>
>> http://en.wikipedia.org/wiki/Polar_decomposition
>>
>> whereby an arbitrary matrix (even singular) can be decomposed as
>>
>>     rotation * scaling
>>
>> or
>>
>>     scaling * rotation
>>
>> provided that the scaling is allowed to be along arbitrary orthogonal
>> axes, not just the X/Y/Z axes. That is the right approach because it has
>> meaning at the level of the geometric transformation itself, not just for
>> its matrix in a particular basis. In other words, you can use another basis
>> and still get the same rotation and scaling.
>>
>> The skew factors are a poor way to compensate for its inability to
>> account for other axes than X/Y/Z.
>>
>> The requirement that the top-left 3x3 block be non-singular is another
>> sad aspect. It's an artifact of using non-pivoting QR here. It could be
>> solved by pivoting or, better, by abandoning QR completely and switching to
>> a polar decomposition, which doesn't have this limitation. Anyway, in its
>> current form, the algorithm suffers from the same issue as I outlined in my
>> first email for inverse(): it's really bad to suddenly stop working on
>> "singular" matrices. (See first email).
>>
>
> I agree that the chosen decomposition method was not the most optimal but
> it's not an unreasonable. (Flash for instance offers the same decomposition
> logic)
>

As I tried to explain above, I think that the strongest sense in which you
could say it's "not unreasonable" is it may work well enough on particular
cases.



>
>
>>
>> At this point I have NO confidence in the W3C working groups to spec
>> anything doing nontrivial matrix computations. This decomposition should
>> have been turned down.
>>
>
> We did not 'choose' it. We simply documented what's implemented by the
> browsers.
> You are free to choose another algorithm but it won't match what browsers
> do.
>

Not sure what you mean. I understand that this decomposition preexists at
least in CSS transforms. I'm saying that CSS transforms should have been
turned down in their present form because of that.



>
>
>> I had a suspiscion that this working group was outstretching its domain
>> of competence by speccing matrix APIs, now I'm convinced of it.
>>
>
> This is just a proposal, not an edict. What would you like to see that
> would put you at ease?
>

Again, I think the wind has turned and there is no basis anymore to expect
that a browser-supported matrix library would outperform JS ones in the
long run. Given that I believe that we should consider any existing
Web-facing matrix APIs purely legacy and from now on focus on letting users
use their own JS matrices and let new Web APIs just communicate matrices as
typed arrays. Or if for whatever reason a matrix interface is deemed to be
still useful then at least let it focus solely on data exchange and not on
computation.


>
> Thinking about this some more, I agree that we should redesign the
> interface.
> The quaternion/QR decomposition is useful for smooth transitions between 2
> different matrices, but it does not produce intuitive output for a
> decomposition.
>
> Maybe it should be replaced with an 'interpolate' method:
>
> Matrix interpolate(Matrix endMatrix, float progress)
>
> This will give you an interpolated matrix calculated with the pseudocode
> in the spec.


With such a signature, this method would be very slow because it would have
to redecompose matrices everytime it's called. You'd have to introduce an
object interface to store your internal decompositions to avoid recomputing
them on every interpolation step.

But people can already do that very well in JS and while that would give
significantly slower code with current JS engines, saving the cost of a DOM
API call on every interpolation step might well be enough already to offset
the performance difference. In the long run, I don't expect that a native
implementation could keep compete performance-wise with a JS one, except if
it gets special shortcuts to avoid the cost of DOM API calls, but as a
browser developer that's what I don't want to happen.


>
> If we still want a reasonable decomposition method, could you provide the
> pseudo code for that?
>

The one I mentioned above is best implemented by doing a SVD, so you can
use any pseudo-code for that --- once you have the SVD you are 99% done as
I explain below. For simple 3x3 SVD (as we need here) you could find quick
closed-form implementations though I don't have one ready for you here.
That SVD will give you

   matrix = U * diagonal * V^T

where U and V are rotations, ^T means transpose. Then letting:

  rotation1 = U * V^T
  rotation2 = V

gives you the form I discussed above.

So the polar decomposition form discussed above is matrix = rotation1 *
scaling where scaling is V * diagonal * V^T.

But again I'm not advocating that we add browser-level support for doing
that.



>
>
>> It was a request from SVG WG members to provide one decomposing
>>> operation. To use the same as CSS Transforms is a logical way. The
>>> composing and decomposing are described by the pseudo code at the end of
>>> the specification (will be removed to avoid duplication with CSS
>>> Transforms). I understand your concerns about additional decomposing
>>> algorithms, but this is the reasoning for this specific algorithm.
>>>
>>> To point 6. This is not a matrix library. The spec provides a simple set
>>> of functions to do basic operations. It does not aim to allow full linear
>>> algebra.
>>
>>
>> As we just discussed, this offers a QR decomposition method (part of
>> decompose()) even if it's hidden under misleading geometric names.
>>
>
> Yes. I think there are errors there. Let's fix them!
>
>
>> This also offers matrix products, and various geometric transformation
>> helpers. In my book, this _is_ a matrix library; regardless of naming, this
>> is plenty complex enough to be very hard to optimize fully.
>>
>
> I agree. It is a library.
>
>
>>
>> Even an API offering only, say, translate() and scale() and skew() and
>> transpose() would already have hard problems to solve. First, as these are
>> cheap operations, the overhead of a DOM API call would be dominant, so
>> browser developers would be scratching their heads about whether to add
>> special JS-engine-level shortcuts to avoid the overhead of DOM calls there.
>> That may sound overengineering until you realize that if a benchmark
>> stresses these operations, having such shortcuts will allow to get faster
>> by easily TWO orders of magnitude there. Now suppose that a browser engine
>> has such shortcuts. The next problem as I mentioned in my first email is
>> temporaries removal. Indeed if a benchmark (or a real application, for
>> that's a real use case) does .translate().scale().skew()... then avoiding
>> copying the intermediate results to/from temporary matrices will allow > 2x
>> speedups. In short, as soon as you have_any_ computational feature in a
>> matrix library, it's a tough job to optimize and maintain.
>>
>
> I'm unsure if I agree. Once you have the matrix object, all the variables
> live on the C++ side. JS will just have 1 object that it has to manage.
>

The problem is most 4x4 matrix operations are cheap enough that the cost of
DOM API calls would be dominant.


> There are also calls to modify the matrix in-place that can be used to
> avoid temporaries.
>

That's true but it only solves part of the problem. First, users would
still use the non-in-place calls, so we'd still have to optimize them.
Second, even so, that only brings part of the advantages of removing
temporaries. Another big advantage, that that doesn't give us, is
minimizing the number of traversals of arrays. If you do e.g.

matrix.translateInPlace(...);
matrix.scaleInPlace(...);

you've traversed the matrix twice while if you had done this with a JS
library on a good JS compiler it could have reordered instructions to
traverse the array only once, generating smaller (hence better) code.

Benoit


>
> I believe WebKit already has this matrix internally to speed things up...
>
>
>>
>>
>>
>>> It just specifies what is necessary to fulfill the goal as an common
>>> exchange format for transformation matrices. You are mentioning benchmarks
>>> for browsers. I actually hope that browsers will optimize for performance
>>> as well. This brings the question of precision over performance. Either you
>>> make a compromise or decide for one or the other. Again, for me this is not
>>> the priority. I can live with one or the other direction.
>>>
>>> I hope this answers some of your questions.
>>>
>>
>> Unfortunately, it doesn't.
>>
>> Benoit
>>
>>
>>>
>>> Greetings,
>>> Dirk
>>>
>>> >
>>> > Benoit
>>>
>>>
>>
>
Received on Wednesday, 20 March 2013 19:11:00 UTC