# Rotation Matrices

My group theory teacher, Prof. Daniel Freedman, had some interesting professorial habits. When invoking some bit of background knowledge with which we were all supposed to have been familiar, he would say, “As you learned in high school. . . .” Typically, this would make a lecture sound a bit like the following:

“To finish the proof, note that we’re taking the trace of a product of matrices. As you learned in high school, the trace is invariant under cyclic permutations. . . .”

Prof. Freedman also said “seventeen” for “zero” from time to time. After working out a long series of mathematical expressions on the blackboard, showing that this and that cancel so that the overall result should be nothing, with the students alternating their glances between the board and their notes, he would complete the equation and proclaim, “Equals seventeen!” At which point, all the students look up and wonder, momentarily, what they just missed.

“Here, we’re summing over the indices of an antisymmetric tensor, so by exchanging i and j here and relabeling there, we can show that the quantity has to equal the negative of itself. The contraction of the tensor is therefore, as you learned in high school — seventeen!”

One day, I managed to best his line. I realized that the formula currently on the board had to work out to one, not zero, so when he wrote the equals sign, paused and turned to the class with an inquiring eye, I quickly raised my hand and said, “Eighteen!”

Incidentally, truly simple topics like Euler’s formula and trigonometric identities were supposed to have been learned in middle or elementary school.

Today, we’ll talk about one of the things Prof. Freedman said we should have covered in high school: the rotation matrices for two- and three-dimensional rotations. This will give us the quantitative, symbolic tools necessary to talk about commutativy and non-commutativity, the topic we explored in an earlier post.

We start with the addition formulas for sines and cosines. These, you’ll recall, give the cosine or the sine of the sum of two angles in terms of the cosines and sines of the angles themselves. For the cosine, we derived earlier that
$\cos(A + B) = \cos A \cos B – \sin A \sin B,$
while for the sine function, we have the result that
$\sin(A + B) = \sin A \cos B + \sin B \cos A.$
We’re going to use these formulas to see what happens when we rotate a point around an axis by a given angle. To start with, we’ll consider points in a two-dimensional plane. We can pick out points in a plane in many ways, of which the two most familiar are Cartesian coordinates and polar coordinates. The former scheme labels points by their distance from two perpendicular axes, say north-south and east-west. It’s much like locating a restaurant on the corner of 3rd Avenue and 7th Street (except that now we’re considering all the points in between intersections, too — we can, if we want, meet for lunch at 1.5th Avenue and π Street). The polar-coordinate scheme, by contrast, locates positions by their distance from the origin and their angle from some reference direction. This is the coordinate system you use when calling out a bogie at two o’clock, three kilometers distance.

With trigonometry, we can relate the two descriptions. Let’s say that x is the coordinate representing east-west displacement (left-right on a map), and y is the north-south distance (up-down on a wall map). Call the distance to 0th Avenue and 0th Street r, for “radius,” and let’s pick the horizontal for our reference direction. Angles will, we decide, be measured from the positive x-axis, which is by convention rightwards and westwards, and we’ll write the angle with the Greek letter theta, or θ.

This is where we break out the trig functions:
$x = r\cos\theta,\ y = r\sin\theta.$
Now, what happens when we take the point (x, y) and rotate it around the origin by some fixed amount? The distance to the origin doesn’t change, but the angle certainly does: the angle shifts by the amount we turn. If the original angle was θ and we call the size of the turning φ, then the new point will be located at the position labeled by r and θ + φ.

Where is this in Cartesian coordinates? Well, adopting the common practice of using apostrophes or “primes” to denote transformed variables, then the previous formula gives us the values of “x prime” and “y prime”:
$x^\prime = r\cos(\theta + \phi),\ y^\prime = r\sin(\theta + \phi).$
We can easily work out what these expressions are in terms of the sines and cosines of θ and φ, just using the addition formulas. First, we try for x’:
$x^\prime = r\cos\theta \cos\phi – r\sin\theta \sin\phi.$
Remember that r times cosθ is just another name for x, and r times sinθ is y. So, we can write the new horizontal coordinate in terms of the old horizontal and vertical coordinates, mixed together:
$\boxed{x^\prime = x\cos\phi – y\sin\phi.}$
I put this in a box to indicate that it’s important.

Finding the new vertical coordinate is just as easy. Applying the addition formula, we find that
$y^\prime = r\sin\theta \cos\phi + r\sin\phi \cos\theta,$
which when we recall the polar forms of x and y becomes
$\boxed{y^\prime = x\sin\phi + y\cos\phi.}$
Note that both the new x and the new y involve a mixture of the old x and the old y, and the extent of the mixing is controlled by the trig functions of the rotation angle.

Let’s put these two formulas together to see their similarities:
$x^\prime = x\cos\phi – y\sin\phi$
$y^\prime = x\sin\phi + y\cos\phi.$
There’s a certain pleasing regularity to the appearance of the cosines and sines. We can get at this more neatly if we separate the coordinate variables x and y from the functions which transform them. First, let’s consider x and y joined together: they’re both labels for marking the same point in space, so we should by rights consider them something of a “married couple.” We’ll write this combined quantity as a vertical arrangement enclosed in parentheses, thusly:
$\left(\begin{array}{c}x \\ y\end{array}\right).$
This is one common way of writing a vector, a quantity with both magnitude and direction. (In our case, we denoted the magnitude by r and the direction by θ.) “Seven kilometers” is not a vector, but “seven kilometers due south” is. After a transformation like a rotation, our vector will have primes on its components:
$\left(\begin{array}{c}x^\prime \\ y^\prime\end{array}\right).$
We’ll pass over any moral implications in the fact that a rotation “mixes up” one member of this “marriage” with another.

The vector before the transformation and the new vector produced by the transformation are related, and in writing the vectors in this fashion, we can “pull out” the part of the equations which is the transformation proper, separating it from the variables on which the transformation acts:
$\left(\begin{array}{c}x^\prime \\ y^\prime \end{array}\right) = \left(\begin{array}{cc}\cos\phi & -\sin\phi \\ \sin\phi & \cos\phi \end{array}\right) \left(\begin{array}{c}x \\ y \end{array}\right).$
The square array of expressions is called a matrix. Here, we are multiplying a vector by a matrix to get a new vector. In fact, a vector is just a one-column matrix. (If we wanted our vectors to be written short and wide instead of tall and skinny, we could use row vectors instead of column vectors, which would just involve shuffling entries around on the page.) Matrices can be multiplied together just like numbers can, using a rule which you might be able to figure out from this example, remembering what x’ and y’ are:
$\left(\begin{array}{c}x\cos\phi – y\sin\phi \\ x\sin\phi + y\cos\phi \end{array}\right) = \left(\begin{array}{cc}\cos\phi & -\sin\phi \\ \sin\phi & \cos\phi \end{array}\right) \left(\begin{array}{c}x \\ y \end{array}\right).$
To get the entry in the first row of the product vector, we multiply the first row of the rotation matrix by the first (and only) column of the original vector, item-by-item, and add them up. To get the second row of the result, we multiply the second row of the matrix by the column entries, item-by-item, and add them up. Here, we’re multiplying a 2×2 matrix by a 2×1 matrix to get another 2×1 matrix, but we could multiply bigger matrices. If we were multiplying two 40×40 matrices, the entry in “cellblock 11, 38” would be the product of the entries in the 11th row of the first times those in the 38th column of the second, added together.

Matrix multiplication is one of those things which takes practice to get right. Here, here and here are some examples. Fortunately, the definition for matrix addition is considerably simpler: just add the corresponding entries!

Whew! We’ve built up a fair amount of mathematical machinery, and it would be a shame if we didn’t use it. What can we use these matrices for? Last time, we realized that while rotations in two dimensions commute, rotations about different axes in three dimensions don’t: in 3D, the order of rotation operations matters. Rotation matrices provide a way of teasing out the essence of this odd behavior and presenting the geometrical phenomenon in a useful form.

First, we need to bulk up our machinery from two dimensions to three. This just means making the matrices bigger.

In 3D, we can rotate around the axis of our choice. To get started, we’ll pick three axes such that x denotes the distance to the right, y denotes the distance up and z denotes the distance forward, “out of the chalkboard.” Negative numbers imply a placement in the opposite direction.

Up until now, we’ve been discussing rotations which mix together x and y. In this picture, we can see that such an operation is a turn around the z-axis. We’ll denote it by Rz(φ), where φ is the angle through which we turn. In matrix form,
$R_z(\phi) = \left(\begin{array}{ccc}\cos\phi & -\sin\phi & 0 \\ \sin\phi & \cos\phi & 0 \\ 0 & 0 & 1\end{array}\right).$
We’ve just added an extra row and column to go from 2D to 3D; this extra realm is filled with zeros except for the lower right corner, which is 1. This is the matrix way of saying that a rotation around the z-axis leaves z unchanged.

We can also write a matrix for rotation around the x-axis. It will also have sines and cosines of the angle φ, and a row and column containing zeros except for a solitary 1. Why? Because a rotation around the x-axis leaves the value of x unchanged. We can figure out what goes in the other spaces by observing that a rotation of π/2 radians (90 degrees) around the x-axis will take a point on the y-axis onto the z-axis. This means that the entry in the second column, third row must be the sine of φ (remember that the sine of π/2 is 1). The same rotation will take a point on the z-axis onto the negative y-axis, so the entry in the third column, second row must be -sinφ.
$R_x(\phi) = \left(\begin{array}{ccc}1 & 0 & 0 \\ 0 & \cos\phi & -\sin\phi \\ 0 & \sin\phi & \cos\phi\end{array}\right).$
We can figure out a matrix for rotations about the y-axis in much the same way. Skipping ahead to the punchline,
$R_y(\phi) = \left(\begin{array}{ccc}\cos\phi & 0 & \sin\phi \\ 0 & 1 & 0 \\ -\sin\phi & 0 & \cos\phi\end{array}\right).$

Note what happens to each of these matrices when we rotate by zero radians. Because the sine of zero is zero and the cosine of zero is one, each of the rotation matrices reduce to the same thing, a 3×3 matrix whose entries are all zero, except along the diagonal, where they are all identically 1. This “do nothing” operation is called the identity, and the identity matrix is
$\mathbb{I} = \left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{array}\right).$
Multiplying any matrix by the identity gives back the original matrix; it’s the equivalent of the number 1 on the familiar number line. (Identity matrices exist for all sizes n×n, but here we’ll only need the 3×3 version.)

Now that we’ve graduated high school, it’s time to go to college!

We’re going to take “baby steps” and consider infinitesimal rotations. That is, we’re going to look at the form these matrices take when the angle φ is very small indeed. Recall that for small angles, the sine of the angle is just the angle (when you’re working in radians). Cosines of such small angles are approximately 1. This means that each of the rotation matrices becomes a matrix containing only ones, zeros and two instances of the small angle. Furthermore, each entry on the diagonal of a rotation matrix is either 1 or a cosine, but in the small-angle regime, the cosines become 1, so the diagonal is all ones, just like the identity matrix.

We’re going to take advantage of this fact and write the rotation matrices for small angles as the identity matrix plus another matrix. We can be even sneakier if we realize that the entries in this other matrix are all either 0 or the small angle, which by convention we’ll call epsilon, ε. We can factor out the number ε to give a matrix which contains just zeros and ones. (Multiplying a number times a matrix just multiplies each element in the matrix by that number.) Thus we say,
$R_j(\epsilon) = \mathbb{I} + \epsilon I_j.$
Here, the letter j denotes any one of the axes x, y or z — we mean that this equation is true for all of them, and we don’t want to write the same thing three times. I get the following for the matrix Ix:
$I_x = \left(\begin{array}{ccc}0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0\end{array}\right).$
We can turn around and substitute this into the previous equation: multiply each entry by ε, add 1 to each diagonal entry, and we get back the small-angle version of Rx.

(If you’ve had a little calculus, this might remind you of “linear approximations” and Taylor series.)

We can play the same game with the y-axis rotation matrix to get Iy:
$I_y = \left(\begin{array}{ccc}0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0\end{array}\right).$
Again, multiply by ε and add the identity matrix to check. Finally,
$I_z = \left(\begin{array}{ccc}0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0\end{array}\right).$
The three matrices Ij are called the generators of the rotation transformations.

Remember, we’re trying to study what happens when we perform successive rotations around different axes. So, let’s multiply the generators of x– and y-axis rotations. The matrix multiplication isn’t so bad, because everything is either 0 or 1, and the result is the following:
$I_x I_y = \left(\begin{array}{ccc}0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0\end{array}\right).$
But, oh la, what happens when we multiply the same matrices in the other order? As it turns out, the procedure for matrix multiplication I sketched earlier means that matrix multiplication is not commutative.
$I_y I_x = \left(\begin{array}{ccc}0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right).$
The two products, taken in opposite orders, are different! By how much do they differ? Well, we can just subtract one from the other:
$I_x I_y – I_y I_x = \left(\begin{array}{ccc}0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 0\end{array}\right) = I_z.$
Surprise! The difference is just the third generator, Iz.

That’s an odd enough coincidence that we should be tempted to press further. What about the generators of x– and z-axis rotations? First, try one order:
$I_x I_z = \left(\begin{array}{ccc}0 & 0 & 0 \\ 0 & 0 & 0 \\ 1 & 0 & 0\end{array}\right).$
Then, try the other:
$I_z I_x = \left(\begin{array}{ccc}0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{array}\right).$
Again, the answers differ, and again, we subtract to find how much they differ:
$I_z I_x – I_x I_z = \left(\begin{array}{ccc}0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0\end{array}\right) = I_y.$
Double whammy! The “mismatch” between the z and x generators is just the y generator.

Well, now there’s nothing stopping us from trying the next pair:
$I_y I_z = \left(\begin{array}{ccc}0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 1 & 0\end{array}\right).$
Again, we check the multiplication in the other order:
$I_z I_y = \left(\begin{array}{ccc}0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0\end{array}\right).$
The matrices differ by the amount. . .
$I_y I_z – I_z I_y = \left(\begin{array}{ccc}0 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0\end{array}\right) = I_x.$
All of the generators are tied up, somehow, such that combinations of two give back the third.

Here, we introduce a more compact notation, since we’re going to be dealing with a good many quantities which fail to commute. For any two thingamajigs A and B,
$\{A,B\} = AB – BA.$
In this notation, the results we worked out a moment ago take on an interesting form:
$\{I_x,I_y\} = I_z,\ \{I_z,I_x\} = I_y,\ \{I_y,I_z\} = I_x.$
Notice the pattern of the subscripts. Each equation lists the generators in the order xyz, but sometimes “wrapped around.” If we wrote the letters x, y and z clockwise around a circle, we could get these three orderings by picking a starting letter and reading around, clockwise. Such rearrangements are called cyclic permutations.

This is, almost, the form of the rotation operations we’ll use to explore quantum mechanics. The next step, which should be a brief one to cover, will be to roll these three equations into one, taking advantage of their cyclic character. After that, we’ll see what we can do to quantum systems which exhibit rotational symmetry.

RELATED POSTS:

## 3 thoughts on “Rotation Matrices”

1. This is just your penance for having the audacity, the audacity, sir, to miss one of my Behe posts :P

TAG!

2. Eric says:

Looks like you’ve got a sign error in the calculation of [Iz,Ix].
[Iz,Ix]=Iy, but the matrix you’ve written is -Iy. Nice post though.

3. Gah!

I had originally calculated all these matrices and things for rotations of the coordinate axes, which is the same as rotating the points by the opposite angle, so I’m not surprised I had some stray minus signs drifting about.