Intermezzo: The Dirac Equation

After you’ve been Pharyngulated a couple times, you develop a protective strategy to deal with the aftermath. “How,” you ask yourself, “can I get rid of the extra readers whom I’ve probably picked up?” The answer, for me at least, is clear:



Science After Sunclipse has been presenting an introduction to supersymmetric quantum mechanics. This area of inquiry stemmed from attempts to understand the complicated implications of supersymmetry in a simpler setting than quantum field theory; just as supersymmetry began in string theory and developed into its own “thing,” so too has this offshoot become interesting in its own right. In a five-part series, we’ve seen how the ideas of “SUSY QM” can be applied to practical ends, such as understanding the quantum properties of the hydrogen atom. I have attempted to make these essays accessible to undergraduate physics students in their first or possibly second term of quantum theory. Having undergraduates solve the hydrogen atom in this fashion is rather unorthodox, but this is a safe kind of iconoclasm, as it was endorsed by three of my professors.

The posts in this series to date are as follows:

Having solved the “Coulomb problem,” we have attained a plateau and can move in several directions. The solution technique of shape-invariant partner potentials is broadly applicable; virtually all potentials for which introductory quantum classes solve the Schrödinger Equation can be brought into this framework. We can also move into new conceptual territory, connecting these ideas from quantum physics to statistical mechanics, for example, or moving from the non-relativistic regime we’ve studied so far into the territory of relativity. Today, we’ll take the latter route.

We’re going to step aside for a brief interlude on the Dirac Equation. Using some intuition about special relativity, we’re going to betray our Vulcan heritage and take a guess — an inspired guess, as it happens — one sufficiently inspired that I strongly doubt I could make it myself. Fortunately, Dirac made it for us. After reliving this great moment in TwenCen physics, we’ll be in an excellent position to explore another aspect of SUSY QM.


Let’s ground ourselves with the basic principles of special relativity. (Recently, Skulls in the Stars covered the history of the subject.) First, we have that the laws of physics will appear the same in all inertial frames: if Joe and Moe are floating past each other in deep space, Joe can do experiments with springs and whirligigs and beams of light to deduce physical laws, and Moe — who Joe thinks is moving past with constant velocity — will deduce the same physical laws. Thus, neither Joe nor Moe can determine who is “really moving” and who is “really standing still.”

Second, all observers will measure the same speed of light. In terms of a space-time diagram, where time is conventionally drawn as the vertical axis and space as the horizontal, Joe and Moe will both represent the progress of a light flash as a diagonal line with the same slope. (This video has some spiffy CG renditions of the concept.) To make life easy on ourselves, we say that this line has a slope of 1, and is thus drawn at a 45-degree angle from the horizontal. This means we’re measuring distance and time in the same units, a meter of time being how long it takes light to travel one meter.

Suppose that Joe sees a burst of light leave a source and arrive a detector a distance [tex]x[/tex] away, taking an amount of time [tex]t[/tex] to do so. We can call the event of the light being emitted [tex]A[/tex] and the event of its absorption [tex]B[/tex]. Joe writes,

[tex]x = ct.[/tex]

Moe, in motion relative to Joe, watches the same pulse travel from [tex]A[/tex] to [tex]B[/tex], but he records a different time elapsed, [tex]t^\prime[/tex], and he observes a different distance between the source and detector, [tex]x^\prime[/tex]. The speed of light being the same for Joe and Moe, we can tell that Moe writes,

[tex]x^\prime = ct^\prime.[/tex]

If Joe chooses his coordinate axes such that the source and detector aren’t perfectly along one axis, Joe will use the Pythagorean Theorem to write the distance traveled,

[tex]d = \sqrt{x^2 + y^2 + z^2},[/tex]

and Joe’s equation for the progress of light will be

[tex](ct)^2 = x^2 + y^2 + z^2.[/tex]

If Moe’s coordinate axes are similarly tilted, he’ll write

[tex](ct^\prime)^2 = x^{\prime2} + y^{\prime2} + z^{\prime2}.[/tex]

The forms of the equations are the same, but Moe’s has primes all over it. Note that if we shuffle all the terms of each equation to the same side, we can see that a quantity remains unchanged by the Lorentz transformation which takes Joe’s coordinates to Moe’s:

[tex]-(ct)^2 + x^2 + y^2 + z^2 = -(ct^\prime)^2 + x^{\prime2} + y^{\prime2} + z^{\prime2} = 0.[/tex]

A massive particle must move slower than light, and so will travel less distance than a light ray in the same amount of time. In that case,

[tex](ct)^2 > x^2 + y^2 + z^2,[/tex]

making the square of the “spacetime interval” negative:

[tex]s^2 = -(ct)^2 + x^2 + y^2 + z^2 < 0,[/tex] so that [tex]s[/tex] is an imaginary number. The essential fact is that Joe and Moe, while they disagree on the space and time contributions to the interval, agree on the interval itself:

[tex]s^2 = -(ct)^2 + x^2 + y^2 + z^2 = -(ct^\prime)^2 + x^{\prime2} + y^{\prime2} + z^{\prime2}.[/tex]

The interval [tex]s[/tex] is a Lorentz-invariant quantity.


We could compress that last equation a little by using summation symbols:

[tex]s^2 = -(ct)^2 + \sum_{i = 1}^3 x_i^2.[/tex]

That [tex]ct[/tex] term out in front is irritating. First, if relativity is telling us that Joe’s time is partially Moe’s space, then we’ve no business writing space and time with different units, and having that [tex]c[/tex] cropping up everywhere is just a hangover from a backward way of life. So, let’s write,

[tex]x_0 = ct,[/tex]

so we can say,

[tex]s^2 = -x_0^2 + \sum_{i = 1}^3 x_i^2.[/tex]

Earlier, we said that when we had an expression like [tex]x_i x_i[/tex], most of the time we’d be summing over the repeated index, so that by the “Einstein summation convention” we wouldn’t have to write the capital sigma. We’d like to do that here, but we have that time component out in front, with the extra minus sign.

This is where we get sneaky. Quantities will be written as four-component objects called four-vectors, in which the components are labeled by an index taking the values 0, 1, 2 and 3. Traditionally, these indices are written with lowercase Greek letters; the position of an index is also significant. The two four-vectors [tex]x_\mu[/tex] and [tex]x^\mu[/tex] exemplify this difference:

[tex]x_\mu = (-ct, x_1, x_2, x_3),\ x^\mu = (ct, x_1, x_2, x_3). [/tex]

The only difference is the sign of the zeroth component. See the trick? If we take the product of these four-vectors, with one index up and the other down, we get what we were looking for:

[tex]x_\mu x^\mu = -(ct)^2 + x_1^2 + x_2^2 + x_3^2 = s^2.[/tex]

Summing over terms with one index up and the other index down yielded us a Lorentz-invariant quantity! In fact, one can define the Lorentz transformations as those operations which leave a product of the form [tex]x_\mu x^\mu[/tex] unchanged. As we develop our relativistic quantum mechanics, we’ll want to be sure that our equations obey the basic rules of relativity, which means that they’ll have to take the same form for Joe and Moe. That is, we’ll want to write all our equations in such a way that we can see their form remains unchanged by a Lorentz transformation, so we’ll always be talking about products of four-vectors, with one index up and the other index down.

Another common four-vector is the four-momentum:

[tex]p_\mu = (-mc, p_1, p_2, p_3),\ p^\mu = (mc, p_1, p_2, p_3). [/tex]

And we’ll also be using a four-vector version of the vector potential, which incorporates the information for the electric and magnetic fields:

[tex]A_\mu = \left(-\frac{\phi}{c}, A_1, A_2, A_3\right),[/tex]
[tex]A^\mu = \left(\frac{\phi}{c}, A_1, A_2, A_3\right).[/tex]

Recall that in high school, as professors like to say, we learned that the electric field can be written as the gradient of a scalar potential,

[tex]\vec{E} = -\nabla \phi,[/tex]

while the magnetic field is the curl of the vector potential:

[tex]\vec{B} = \nabla \times \vec{A}.[/tex]

In quantum mechanics, it often turns out more convenient to work with these potentials than with the fields we studied in classical electromagnetism.


From here on out, I’ll be assuming a moderate background in basic quantum mechanics. The first step is to find an equation which does Schrödinger’s job in the relativistic regime. As indicated, we’ll be focusing on quantities which remain unchanged under Lorentz transformations, i.e., quantities on which all observers resting in inertial frames will agree. We would like to write an equation which not only describes a particle in relativistic motion, but which contains only expressions which all Lorentz observers agree upon.

We can begin by studying the Schrödinger Equation for the time evolution of a state:

[tex]H\ket{\psi(t)} = i\hbar \partial_t \ket{\psi(t)}.[/tex]

We would like to write a Hamiltonian which is relativistically invariant. The first guess might be to use Einstein’s expression for the energy, [tex]E^2 = m^2 c^4 + |\vec{p}|^2 c^2[/tex]. If we use this energy for our Hamiltonian, we get that the time evolution obeys

[tex]\sqrt{(mc^2)^2 + \sum_{j=1}^3(p_j c)^2} \ket{\psi(t)} = i\hbar \partial_t\ket{\psi(t)}.[/tex]

This cannot, however, be the correct equation, because it treats time and space too differently. Elementary relativity tells us that one observer may see two events as happening simultaneously in time (but separated in space), while an observer in a different Lorentz frame sees them separated in time as well. One observer’s time is partly another’s space, so singling out a temporal or spatial dependence cannot work in a relativistic equation. (This is a deep point, well worth considering for more time — or space — than I can devote to it.)

Dirac’s solution was to suppose that since the right-hand side of the time-evolution equation contains a first-order time derivative, the left-hand side should contain corresponding first-order derivatives in space. Dirac showed that this could be done by writing the quantity under the radical as the square of an expression linear in [tex]p_j[/tex]. (Recall that in quantum mechanics, [tex]p_j = -i\hbar\partial_j[/tex].) Taking the square root of a perfect square is easy, so Dirac’s proposal is algebraically convenient. Writing Dirac’s idea out in symbols, we get

[tex](mc^2)^2 + \sum_{j=1}^3(p_j c)^2 = \left(\alpha_0 mc^2 + \sum_{j=1}^3 \alpha_j p_j c\right)^2.[/tex]

These conditions cannot be satisfied if the [tex]\alpha[/tex]s are ordinary numbers, but it is easy to find matrices which do the job. We need a total of four, and each will be a [tex]4\times4[/tex] matrix. Letting the indices [tex]\mu,\nu[/tex] run from 0 to 3, the conditions on our [tex]\alpha[/tex]s can be written

[tex]\{\alpha_\mu,\alpha_\nu\} = 2\delta_{\mu\nu}\idmat.[/tex]

Dirac’s choice for solving this equation was the matrices

[tex]\alpha_0 = \left(\begin{array}{cc} 1 & 0 \\ 0 & 1 \\ \end{array}\right), [/tex]

[tex]\alpha_j = \left(\begin{array}{cc} 0 & \sigma_j \\ \sigma_j & 0 \\ \end{array}\right).[/tex]

Many choices of matrices work to satisfy the condition on our matrices, but they lead to equivalent physics, and this choice is a convenient one. Here, [tex]1[/tex] signifies the [tex]2\times2[/tex] identity matrix, [tex]\sigma_j[/tex] are the Pauli matrices, and 0 stands for a [tex]2\times2[/tex] matrix full of zeroes. Now, it is a straightforward matter to write the equation we need:

[tex]H_D\ket{\psi(t)} = i\hbar \partial_t \ket{\psi(t)},[/tex]


[tex] H_D = \alpha_0 mc^2 + \sum_{j=1}^3 \alpha_j p_j c.[/tex]


The relationship defined by our previous two equations is just a guess at how the world works. To see if it has any value, we have to work out the consequences of that guess, and then compare those implications to the physical world.

With the Dirac Equation in hand, then, we can make a few observations. First, the state [tex]\ket{\psi}[/tex] is now a four-component object, which we call a spinor. To understand what this means, we look at the Dirac Hamiltonian’s energy eigenvalues. Rewrite the time-independent part of our equation as [tex]H_D\ket{\psi} = E\ket{\psi}[/tex], and try for a plane-wave solution. For convenience, assume that the wave is propagating in the [tex]z[/tex]-direction, giving [tex]\psi(z) = \chi e^\frac{ipz}{\hbar}[/tex], where [tex]p[/tex] is the momentum and [tex]\chi[/tex] is a constant spinor. Substituting this ansatz into the Dirac Equation shows that each value of [tex]p[/tex] has two associated eigenspaces, one containing states of positive energy and the other with negative energy:

[tex]E_\pm(p) = \pm \sqrt{(mc^2)^2 + (pc)^2}.[/tex]

Dirac viewed the negative-energy solutions as “holes” in the vacuum, which he pictured as a sea of electrons. (This image is analogous to the solid-state physics concept of holes in a solid’s conduction band.) In modern times, we associate the negative-energy solution with the electron’s antiparticle, the positron: the positron is the absence of a negative-energy state, which forms a positive-energy hole. In any case, if we look at the non-relativistic limit [tex]|E| – mc^2 \approx p^2/2m \ll pc[/tex], the eigenstates which span the two eigenspaces are associated with amplitudes for being in spin-up or spin-down states, with either positive or negative energy. This explains why we need a four-component object to hold all the information associated with a single electron.

By guessing that the relativistic motion of the electron should have a certain form, we’ve found that the electron has to have a “twin” — somewhere, there’s an electron with a beard! Four years after Dirac came out with his equation, Carl Anderson published a study of 1300 cosmic-ray tracks which revealed a new, positively charged particle which couldn’t be as massive as the proton:

From an examination of the energy-loss and ionization produced it is concluded that the charge is less than twice, and is probably exactly equal to, that of the proton. If these particles carry unit positive charge the curvatures and ionizations produced require the mass to be less than twenty times the electron mass.

The rest, as they say, is history.


The above analysis applies to an electron propagating in free space. We can modify the Dirac Hamiltonian [tex]H_D[/tex] to include electromagnetic interactions fairly easily, by examining the momentum terms. In a non-zero vector potential, the Hamiltonian becomes the following:

[tex]H_D = \alpha_0 mc^2 + \sum_j \alpha_j\left[p_j – \frac{e}{c}A_j\right]c + e\phi.[/tex]

Both [tex]\vec{A}[/tex] and the scalar potential [tex]\phi[/tex] can, of course, be functions of [tex]\vec{x}[/tex] and [tex]t[/tex]. (This is a semiclassical approximation, because it treats the electromagnetic field as obeying Maxwell’s laws. In truth, the EM field is quantized as photons, but for our purposes such quantization will not be necessary.)


From now on, keeping factors of [tex]c[/tex] will be more tiresome than useful, so take [tex]c = 1[/tex], and thus [tex]E = m[/tex]. The Einstein summation convention, combined with four-vectors, allows us to give the Dirac Hamiltonian a compact form,

[tex]H_D = \alpha_0 m + \alpha_j(p_j – eA_j) + e\phi[/tex]

which can be rewritten as

[tex]H_D = \alpha^\mu(p_\mu – eA_\mu),[/tex]


[tex]\alpha^\mu = (\alpha_0,\alpha_1,\alpha_2,\alpha_3).[/tex]

In fact, [tex]\partial_t[/tex] can also be written [tex]\partial_0[/tex], since (up to an irrelevant factor [tex]c[/tex]) the quantity [tex]x^0[/tex] is just the time observed in a particular Lorentz frame. With this idea in mind, we can write yet another four-vector made entirely of derivatives:

[tex]\partial^\mu = (-\partial_0, \partial_1, \partial_2, \partial_3).[/tex]

Another useful object with indices is the flat space metric, denoted [tex]\eta_{\mu\nu}[/tex]. It is a matrix whose entries are all zero except along the diagonal, where they are given by [tex](-1, 1, 1, 1)[/tex]. The [tex]-1[/tex] in the first (or the zeroth, depending on how literally you take matrix notation!) entry reflects the special status the time dimension has in Einstein’s theory. To put our equations in the form most convenient for later work, we define new matrices by

[tex]\gamma_0 = \alpha_0,\ \gamma_j = \alpha_0 \alpha_j,[/tex]

which have the anticommutator relation

[tex]\{\gamma_\mu,\gamma_\nu\} = 2\eta_{\mu\nu}.[/tex]

Using the gamma matrices, whose relationship defines a “Clifford algebra,” the Dirac Equation can be recast in the manifestly covariant form

[tex]\boxed{\left[i\gamma^\mu(\partial_\mu + iA_\mu) – m\right] \psi = 0.}[/tex]

One can verify this by substituting the definition of [tex]\gamma_\mu[/tex] given above and comparing the result to the original Dirac Equation, given by [tex]H_D \psi = i \partial_0\psi[/tex].

Richard Feynman introduced a compact notation for summing over products involving the gamma matrices. For any 4-vector [tex]a_\mu[/tex],

[tex]\displaystyle{\not} a = \gamma^\mu a_\mu.[/tex]

In this notation, the Dirac Equation without a vector potential is

[tex](i\ \displaystyle{\not} \partial – m)\psi = 0.[/tex]

It’s said that after the historic 1948 Pocono conference, Gregory Wentzel, Edward Teller and Enrico Fermi spent six weeks poring over the work of Julian Schwinger, but all anyone could remember of Feynman’s talk was his notation: letters with funny little slashes through them.


My path to the Dirac Equation mimics that in Shankar’s textbook, Principles of Quantum Mechanics (1994), more or less; a little also came out of A. Zee’s Quantum Field Theory in a Nutshell (2003), Chapter II.1. A gentle introduction to Lorentz invariance and spacetime intervals can be found in chapter two of Zwiebach’s First Course in String Theory (2004), which also covers four-vector potentials in chapter three.

4 thoughts on “Intermezzo: The Dirac Equation”

  1. “How,” you ask yourself, “can I get rid of the extra readers whom I’ve probably picked up?” The answer, for me at least, is clear:



  2. Having once memorized a simplified version of the Lorentz transformation-based logic Uncle Albert used to derive special relativity, I was actually able to follow all of this until the Schrodinger equation. After that, I was troubled, since I’ve only had first-year calculus, no Hamiltonians, and precious little matrix algebra. Though, on reflection, I’m hazarding the guess that there is a deep connection between the usefulness of the four-vector jobbies and the Pauli exclusion principle. Yes? (gulp)

  3. We are a hop and a skip from the Pauli exclusion principle, conceptually; I could probably cover the hop in one post and the skip in another. However, the link isn’t in the direction that you’re suggesting, if I read you aright.

    Four-vectors themselves are just our way of writing equations which are sure to obey the basic principles of relativity. They can describe fermions, which obey the exclusion principle, or bosons, which don’t; they can also appear in a classical theory which doesn’t even incorporate quantum mechanics. In this post, I mentioned the four-vector generalization of the vector potential as a way of describing the electromagnetic field; you’ve probably heard that the electromagnetic force is carried by photons and that photons are bosons, so it’s not the use of four-vectors which gives us Pauli exclusion.

    The key point to remember is that the Dirac Equation, as I’ve built it up here, is a single-particle equation: it describes the behavior of a single particle in relativistic motion. The exclusion principle, and that whole difference between “Bose-Einstein statistics” and “Fermi-Dirac statistics,” talks about multiple particles, and what they do when you try to lump them together. To work with that, we really need a formalism which includes states that can represent multiple particles in combination. This requires promoting our quantum mechanics to a quantum field theory.

    Back when they were first figuring this stuff out, Dirac stuck the exclusion principle into his math to make the picture of that negative-energy “electron sea” work out correctly. Pauli never liked the “Dirac sea,” and he did a whole lot of work to try and attack it — work which led right up to the “spin-statistics theorem” as we know it today. This is the rule that particles of integer spin obey Bose-Einstein statistics, while those of half-integer spin obey Fermi-Dirac statistics.

    Nowadays, we treat the “Dirac sea” as a heuristic device, and we just go ahead and develop the spin-statistics connection in the full apparatus of quantum field theory, since it’s naturally set up to incorporate states of multiple particles.

    By the way, I’ve heard of students braving their way through Zwiebach’s textbook having had only first-year physics. If you want to level up in Hamiltonian mechanics and other such stuff, it might be an option worth considering.

Comments are closed.