Philosophy of Statistics

Jakob and Todd discuss the philosophy of statistics. Frequentist and Bayesian approaches. Fisher, Neyman, and Pearson and statistical methods for evaluating hypotheses. Deborah Mayo and statistical inference as severe testing. Proper and improper uses of p-values. The pitfalls of data dredging and p-hacking. Conditions under which prior probabilities make Bayesian approaches particularly useful. The utility of Bayesian concepts like priors, posteriors, updating, and loss functions in machine learning. Bayes’ Theorem versus Bayesianism as a statistical philosophy. An algorithmic ‘method of methods’ for when to apply various statistical tools as an AI-complete problem. Important test cases in statistics like the Higgs Boson observation, the Eddington experiment for General Relativity, and the causal link between smoking and cancer. The problem of induction. Inferring the direction of causation for correlated variables. Karl Popper, falsification, and the impossibility of confirmation. What counts as evidence. Randomness as a limitation on knowledge and as a feature of reality itself. The ontological status and nature of a probability distribution, of classical values and as a quantum property.

Causal and Emergent Models

Models are critical tools that enable us to think about, qualify, and quantify features of many processes. And as with any kind of tool, different kinds of models are better suited to different circumstances. Here we look at two kinds of models for understanding transport phenomena: causal and emergent models. In a causal process there is some kind of distinct, sequential, goal-oriented event with an observable beginning and end. In an emergent process there are uniform, parallel, independent events with no beginning or end but in which observable patterns eventually emerge.

For the video version of this episode, which includes some visual aids, see on YouTube.

Since my university studies I’ve been fascinated by the ways we use models to understand and even make quantitative descriptions and predictions about the world. I don’t remember when exactly, but at some point I really began to appreciate how the pictures of chemical and physical processes I had in my head were not the way things “really” were (exactly) but were useful models for thinking about things and solving problems.

Conceptual models in science, engineering, economics, etc. are similar to toy models like model cars or model airplanes in that they aren’t the things themselves but have enough in common with the things they are modeling to still perform in similar ways. As long as a model enables you to get the information and understanding you need it is useful, at least for the scale and circumstances you’re interested in. Models are ubiquitous in the sciences and one of the major activities in the sciences is to improve models, generate new models, and create more models to apply to more conditions.

Something to bear in mind when working with a model is the set of conditions in which it works well. That’s important because a model may work very well under a certain set of conditions but then break down outside those conditions. Outside those conditions it may give less accurate results or just not describe well qualitatively what’s going on in the system we’re trying to understand. This could be something like being outside a temperature or pressure range, extremes in velocity or gravitational field strength, etc. And often it’s a matter of geometric scale, like whether we’re dealing in meters or nanometers. The world looks different at the microscopic and molecular scale than at the macroscopic scale of daily life.

I’m really a pluralist when it comes to models. I’m in favor of several types to meet the tasks at hand. Is a classical, Newtonian model for gravity superior to a relativistic model for gravity? I don’t think so. Yeah, a Newtonian model breaks down under certain conditions. But it’s much easier and intuitive to work with under most conditions. It doesn’t make sense to just throw away a Newtonian model after relativity. And we don’t. We can’t. It would be absurdly impractical. And practicality is a major virtue of models. That’s not to say there’s no such thing as better or worse models. A Newtonian model of planetary motion is better than a Ptolemaic one because it’s both more accurate and simpler to understand. So I don’t embrace pluralism without standards of evaluation. I suppose there’d be an infinite number of really bad models in the set of all possible models. Even so, there are still multiple that do work well, that overlap and cover similar systems.

I studied chemical engineering in the university and one of my textbooks was Transport Phenomena by Bird, Stewart, and Lightfoot, sort of a holy trinity of the discipline. Transport phenomena covers fluids, heat, and diffusion, which all share many features and whose models share a very similar structure. One of the ideas I liked in that book is its systematic study of processes at three scales: macroscopic, microscopic, and molecular. I’ll quote from the book for their explanations of these different scales.

“At the macroscopic level we write down a set of equations called the ‘macroscopic balances,’ which describe how the mass, momentum, energy, and angular momentum in the system change because of the introduction and removal of these entities via the entering and leaving streams, and because of various other inputs to the system from the surroundings. No attempt is made to understand all the details of the system.”

“At the microscopic level we examine what is happening to the fluid mixture in a small region within the equipment. We write down a set of equations called the ‘equations of change,’ which describe how the mass, momentum, energy, and angular moment change within this small region. The aim here is to get information about velocity, temperature, pressure, and concentration profiles within the system. This more detailed information may be required for the understanding of some processes.”

“At the molecular level we seek a fundamental understanding of the mechanisms of mass, momentum, energy, and angular momentum transport in terms of molecular structure and intermolecular forces. Generally this is the realm of the theoretical physicist or physical chemist, but occasionally engineers and applied scientists have to get involved at this level.”

I came across an interesting paper recently from a 2002 engineering education conference titled How Chemical Engineering Seniors Think about Mechanisms of Momentum Transport by Ronald L. Miller, Ruth A. Streveler, and Barbara M. Olds. It caught my attention since I’ve been a chemical engineering senior so I wanted to see how it compared to my experience. And it tracked it pretty well actually. Their idea is that one of the things that starts to click for seniors in their studies, something that often hadn’t clicked before, is a conceptual understanding of many fundamental molecular-level and atomic-level phenomena including heat, light, diffusion, chemical reactions, and electricity. I’ll refer mostly to the examples from this paper by Miller, Streveler, and Olds but I’ll mention that they base much of their presentation on the work of Michelene Chi, who is a cognitive and learning scientist. In particular they refer to her work on causal versus emergent conceptual models for these physical processes. Her paper on this is titled Misconceived Causal Explanations for Emergent Processes. Miller, Streveler, and Olds propose that chemical engineering students start out using causal models to understand many of these processes but then move to more advanced, emergent models later in their studies.

In a causal process there is some kind of distinct, sequential, goal-oriented event with an observable beginning and end. In an elastic collision for instance, a moving object collides with a previously stationary object and transfers its momentum to it. In an emergent process there are uniform, parallel, independent events with no beginning or end but in which observable patterns eventually emerge. Electricity, fluid flow, heat transfer and molecular equilibrium are examples of emergent processes. Miller, Streveler, and Olds correlate causal and emergent explanations with macroscopic and molecular models respectively. As Bird, Stewart, and Lightfoot had said in their descriptions of their three scales, it’s at the molecular level that “we seek a fundamental understanding of the mechanisms.” But at the macroscopic scales we aren’t looking at so fundamental an explanation.      

Miller, Streveler, and Olds use diffusion, i.e. mass transport, as an example to show the difference between causal and emergent explanations. Say we have a glass of water and we add a drop of color dye to it. The water is a solvent and the color dye is a solute. This color dye solute starts to diffuse, or spread, into the water solvent and we can explain this diffusion process in both causal and emergent ways; or we could also say in macroscopic and molecular ways.

First, a quick overview of diffusion. The mathematical model for diffusion is Fick’s Law of Diffusion. The equation for this is:       

J = -D(dC/dx)

J is the diffusion flux
C is concentration
x is position
D is diffusivity, the applicable constant of proportionality in this case

The basic logic of this equation is that the diffusion of a solute is proportional to the gradient of the concentration of that solute in a solvent. If the solute is evenly distributed in the solution the concentration is the same everywhere in the solution, so there is no concentration gradient and no diffusion. But there is a gradient if the solute concentration is different at different positions in the space, for example, if it is highly concentrated at one point and less concentrated as you move away from that point. The diffusion flux is proportional to the steepness of that decrease, that gradient. If a drop of dye has just been placed in a glass of water the flux of diffusion is going to be very high at the boundary between that drop and the surrounding water because there is a huge difference in the concentration of the dye there.

So that’s the logic of Fick’s Law of Diffusion. But why does this happen? And here we can look at the two different kinds of explanations, causal and emergent explanations.         

Here are a few examples of both:

Causal Explanation: “Dye molecules move towards water molecules.”
Emergent Explanation: “All molecules exercise Brownian motion.”

Causal Explanation: “Dye molecules flow from areas of high concentration to areas of low concentration.”
Emergent Explanation: “All molecules move at the same time.”

Causal Explanation: “Dye molecules are ‘pushed’ into the water by other dye molecules.”
Emergent Explanation: “Molecules collide independently of prior collisions. What happens to one molecule doesn’t affect interactions with other molecules.”

Causal Explanation: “Dye molecules want to mix with water molecules.”
Emergent Explanation: “The local conditions around each molecule affect where it moves and at what velocity.”

Causal Explanation: “Dye molecules stop moving when dye and water become mixed.”
Emergent Explanation: “Molecular interactions continue when equilibrium is reached.”

This is gives something of a flavor of the two different kinds of explanations. Causal explanations have more of a top-down approach, looking for the big forces that make things happen, and may even speak in metaphorical terms of volition, like what a molecule “wants” to do. Emergent explanations have more of a bottom-up approach, looking at all the things going on independently in a system and how that results in the patterns we observe.

I remember Brownian motion being something that really started pushing me to think of diffusion in a more emergent way. Brownian motion is the random motion of particles suspended in a medium, like a liquid or a gas. If you just set a glass of water on a table it may look stationary, but at the molecular scale there’s still a lot of movement. The water molecules are moving around in random directions. If you add a drop of color dye to the water the molecules in the dye also have Brownian motion, with all those molecules moving in random directions. So what’s going to happen in this situation? Well, things aren’t just going to stay put. The water molecules are going to keep moving around in random directions and the dye molecules are going to keep moving around in random directions. What kind of patter should we expect to see emerge from this?

Let’s imagine imposing a three-dimensional grid onto this space, dividing the glass up into cube volumes or voxels. Far away from the drop of dye, water molecules will still be moving around randomly between voxels but those voxels will continue to look about the same. Looking at the space around the dye, voxels in the middle of the drop will be all dye. Voxels on the boundary will have some dye molecules and some water molecules. And voxels with a lot of dye molecules will be next to voxels with few dye molecules. As water molecules and dye molecules continue their random motion we’re going to see the most state changes in the voxels that are different from each other. Dye molecules near a voxel with mostly water molecules can very likely move into one of those voxels and change its state from one with few or no dye molecules to one with some or more dye molecules. And the biggest state changes will occur in regions where voxels near to each other are most different, just because they can be so easily (albeit randomly) changed.

This is a very different way of looking at the process of diffusion. Rather than there being some rule imposed from above, telling dye molecules that they should move to areas of high concentration to low concentration, all these molecules are moving around randomly. And over time areas with sharp differences tend to even out, just by random motion. From above and from a distance this even looks well-ordered and like it could be directed. The random motion of all the components results in an emergent macro-level pattern that can be modeled and predicted by a fairly simple mathematical expression. The movement of each individual molecule is random and unpredictable, but the resulting behavior of the system, the aggregate of all those random motions, is ordered and highly predictable. I just think that’s quite elegant!

Miller, Streveler, and Olds give another example that neatly illustrates different ways of understanding a physical process at the three different scales: macroscopic, microscopic, and molecular. Their second example is of momentum transport. An example of momentum transport is pumping a fluid through a pipe. As a brief overview, when a fluid like water is moved through a pipe under pressure the velocity of the fluid is highest at the center of the pipe and lowest near the walls. This is a velocity gradient, often called a “velocity profile”, where you have this cross-sectional view of a pipe showing the velocity vectors of different magnitudes at different positions along the radius of the pipe. When you have this velocity gradient there is also a transfer of momentum to areas of high momentum to areas of low momentum. So in this case momentum will transfer from the center of the pipe toward the walls of the pipe.

The model for momentum transport has a similar structure to the model for mass transport. Recall that in Fick’s Law of Diffusion, mass transport, i.e. diffusion, was proportional to the concentration gradient and the constant of proportionality was this property called diffusivity. The equation was:

J = -D(dC/dx)

The corresponding model for momentum transport is Newton’s law of viscosity (Newton had a lot of laws). The equation for that is:

τ = -μ(dv/dx)


τ is shear stress, the flux of momentum transport
v is velocity
x is position
μ is viscosity, the applicable constant of proportionality in this case

So in Newton’s law of viscosity the momentum transport, i.e. shear stress, is proportional to the velocity gradient and the constant of proportionality is viscosity. You have higher momentum transport with a higher gradient, i.e. change, in velocity along the radius of the pipe. Why does that happen?

So they actually asked some students to explain this in their own words to see on what geometric scales they would make their descriptions. The prompt was: “Explain in your own words (no equations) how momentum is transferred through a fluid via viscous action.” And they evaluated the descriptions as one being of the three scales (or a mixture of them) using this rubric. So here are examples from the rubric of explanations at each of those scales:

Macroscopic explanation: The pressure at the pipe inlet is increased (usually by pumping) which causes the fluid to move through the pipe. Friction between fluid and pipe wall results in a pressure drop in the direction of flow along the pipe length. The fluid at the wall does not move (no-slip condition) while fluid furthest away from the wall (at the pipe centerline) flows the fastest, so momentum is transferred from the center (high velocity and high momentum) to the wall (no velocity and no momentum).

Microscopic explanation: Fluid in laminar flow moves as a result of an overall pressure drop causing a velocity profile to develop (no velocity at the wall, maximum velocity at the pipe centerline). Therefore, at each pipe radius, layers of fluid flow past each other at different velocities. Faster flowing layers tend to speed up [and move] slower layers along resulting in momentum transfer from faster layers in the middle of the pipe to slower layers closer to the pipe walls.

Molecular explanation: Fluid molecules are moving in random Brownian motion until a pressure is applied at the pipe inlet causing the formation of a velocity gradient from centerline to pipe wall. Once the gradient is established, molecules that randomly migrate from an area of high momentum to low momentum will take along the momentum they possess and will transfer some of it to other molecules as they collide (increasing the momentum of the slower molecules). Molecules that randomly migrate from low to high momentum will absorb some momentum during collisions. As long as the overall velocity gradient is maintained, the net result is that momentum is transferred by molecular motion from areas of high momentum to areas of low momentum and ultimately to thermal dissipation at the pipe wall.

With these different descriptions as we move from larger to smaller scales we also move from causal to emergent explanations. At the macroscopic level we’re looking at bulk motion of fluid. At the microscopic scale it’s getting a little more refined. We’re thinking in terms of multiple layers of fluid flow. We’re seeing the gradient at a higher resolution. And we can think of these layers of fluid rubbing past each other, with faster layers dragging slower layers along, and slower layers slowing faster layers down. It’s spreading out a deck of cards. In these explanations momentum moves along the velocity gradient because of a kind of drag along the radial direction.

But with the molecular description we leave behind that causal explanation of things being dragged along. There’s only one major top-down, causal force in this system and that’s the pressure or force that’s being applied in the direction of the length of the pipe. With a horizontal pipe we can think of this force being applied along its horizontal axis. But there’s not a top-down, external force being applied along the vertical or radial axis of the pipe. So why does momentum move from the high-momentum region in the center of the pipe to the low-momentum region near the pipe wall? It’s because there’s still random motion along the radial or vertical axis, which is perpendicular to the direction of the applied pressure. So molecules are still moving randomly between regions with different momentum. So if we think of these layers, these cylindrical sheets that are dividing up the sections of the pipe at different radii, these correspond to our cube voxels in the diffusion example. Molecules are moving randomly between these sheets. The state of each layer is characterized by the momentum of the molecules in it. As molecules move between layers and collide with other molecules they transfer momentum. As in the diffusion example the overall pattern that emerges here is the result of random motion of the individual molecular components.

So, does this matter? My answer to that question is usually that “it”, whatever it may be, matters when and where it matters. Miller, Streveler, and Olds say: “If the macroscopic and microscopic models are successful in describing the global behavior of simple systems, why should we care if students persist in incorrectly applying causal models to processes such as dye diffusion into water? The answer is simple – the causal models can predict some but not all important behavioral characteristics of molecular diffusional processes.” And I think that’s a good criterion for evaluation. I actually wouldn’t say, as they do, that the application of causal models is strictly “incorrect”. But I take their broader point. Certainly macroscopic and causal models have their utility. For one thing, I think they’re easier to understand starting off. But as with all models, you have to keep in mind their conditions of applicability. Some apply more broadly then others.

One thing to notice about these transport models is that they have proportionality constants. And whenever you see a constant like that in a model it’s important to consider what all might be wrapped up into it because it may involve a lot of complexity. And that is the case with both the diffusion coefficient and viscosity. Both are heavily dependent on specific properties of the system. For the value of viscosity you have to look it up for a specific substance and then also for the right temperature range. Viscosity varies widely between different substances. And even for a single substance it can still vary widely with temperature. For diffusivity you have to consider not only one substance but two, at least. If you look up a coefficient of diffusivity in a table it’s going to be for a pair of substances. And that will also depend on temperature.

At a macroscopic scale it’s not clear why the rates mass transport and momentum transport would depend on temperature or the type of substances involved. But at a microscopic scale you can appreciate how different types of molecules would have different sizes and would move around at different velocities at different temperatures and how that would all play into the random movements of particles and the interactions between particles that produce, from that molecular scale, the emergent processes of diffusion and momentum transport that we observe at the macroscopic scale.

Once you open up that box, to see what is going on behind these proportionality constants, it opens up a whole new field of scientific work to develop – you guessed it – more and better models to qualify and quantify these phenomena.

Quantum Properties

Should we understand quantum systems to have definite properties? In quantum interpretations values are usually taken to be the eigenvalues directly revealed in experiments and quantum systems generally have no definite eigenvalues. However, Sunny Auyang argues that this does not mean that they don’t have definite properties. The conclusion that they don’t arises from a restricted sense of what counts as a property. The conceptual structure of quantum mechanics is much richer and an expanded notion of properties facilitates an understanding of quantum properties that are more descriptive and structurally sophisticated.

One of the philosophical problems prompted by quantum mechanics is the nature of quantum properties and whether quantum systems can even be said to have properties. This is an issue addressed by Sunny Auyang in her book How is Quantum Field Theory Possible? And I will be following her treatment of the subject here.

One of the major contributors to the development of quantum mechanics, physicist Neils Bohr, whose grave I happened to visit when I was in Copenhagen, said: “Atomic systems should not even be thought of as possessing definite properties in the absence of a specific experimental setup designed to measure these properties.” Why is that? A lot of this hinges on what counts as a property, which is a matter of convention. For the kinds of things Bohr had in mind he was certainly right. But Auyang argues that it’s useful retain the notion and instead locate quantum properties in different kinds of things, in a way Bohr very easily could have agreed with.

Why are the kinds of things Bohr had in mind not good candidates as definite quantum properties? The upshot, before getting into the more technical description, is that in quantum systems properties like position don’t seem to have definite values prior to observation. As an example, in chemistry the electrons bound in atoms and molecules are understood to occupy orbitals, which are regions of space with probability densities. Rather than saying that a bound electron is at some position we say it has some probability to be at some position. If we think of a definite property as being something like position you can see why Bohr would say an atomic system doesn’t have definite properties in the absence of some experiment to measure it. Atomic and molecular orbitals don’t give us a definite property like position.

Auyang takes these kinds of failed candidates for definite properties to be what in quantum mechanics are called eigenvalues. And this will require some background. But to give an idea of where we’re going, Auyang wants to say that if we insist that properties are what are represented by eigenvalues then it is true that quantum systems do not have properties. However, she is going to argue that quantum systems do have properties, they are just not their eigenvalues; we have to look elsewhere to for such properties.

In quantum mechanics the characteristics of a quantum system are summarized by a quantum state. This is represented by a state vector or wave function, usually with the letter φ. A vector is a quantity that has both magnitude and direction. Vectors can be represented by arrows on a graph. So in a two dimensional graph the arrow would go from the center origin out into what is called the vector space. In two dimensions you could express the vector in terms of the horizontal and vertical axes; and the vector space would just be the plane these sweep out or span. It’s common to represent this in two, maybe three dimensions, but it’s actually not limited to that number; a vector space can have any number of dimensions. Whatever number of dimensions it has it will have a corresponding number of axes, which are more technically referred to as basis vectors. Quantum mechanics makes use of a special kind of vector space called a Hilbert space. This is also the state space of a quantum system. So recall that the description of the quantum system is its state, and this is represented by a vector. The state space then covers all permissible states that this quantum system can have.

Let’s limit this to two dimensions for the sake of visualization. And we can refer here to the featured image for this episode, which is a figure from Auyang’s book. We have a vector |φ> in a Hilbert Space with the basis, vectors {|α1>, |α2>}. So for this Hilbert Space |α1> and |α2> are basis vectors that serve as a coordinate system for this vector space. This is the system but it’s not what we interact with. For us to get at this system in some way we need to run experiments. And this also has a mathematical representation. What we get out of the system are observables like energy, position, and momentum, to name a few. Mathematically observables are associated with operators. An operator is a kind of linear transformation. Basically an operator transforms the state vector in some way. As a transformation, an operator usually maps one state into another state. But for certain states an operator will only result in the same state multiplied by some scaling factor. So let’s take some operator, upper case A, and have it operate on state |φ>. The result is a factor, lower case α multiplied by the original state |φ>. We can write this as:

A|φ> = α|φ>

In this kind of equation the vector |φ> is called an eigenvector and the factor α is called an eigenvalue. The prefix eigen- is adopted from the German word eigen for “proper”, “characteristic”, “own”, in reference to the fact that the original state or eigenvector is the same on both sides of the equation. In quantum mechanics this eigenvector is also called an eigenstate.

Now, getting back to quantum properties, I mentioned before that Auyang takes the kind of definite properties that quantum systems are understood not to have prior to observation to be eigenvalues. Eigenstates are certainly observed and corresponding eigenvalues measured in experiments. But the issue is of properties of the quantum system itself. Any given eigenvalue has only a certain probability of being measured, among the probabilities of other eigenvalues. So any single eigenvalue can’t be said to be characteristic of the whole quantum system.

Let’s go back to the two-dimensional Hilbert space with state vector |φ> and basis vectors |α1> and |α2>. The key feature of basis vectors is that every vector in the vector space can be written as a linear combination of those vectors. That’s how they act as a coordinate system. So if we take our vector |φ> we can break it down into two orthogonal (right angle) components, in this case the horizontal and vertical components, and then the values for the coefficients for those components will be some factor, ci, of the basis vectors. So for vector |φ> the components will be c11> and c22>. In the more generalized form with an unspecified number of dimensions we can say that the vector |φ> is equal to the sum of cii> for all i.

|φ> = ∑cii>

The complex numbers ci are amplitudes, or probability amplitudes, though we should note that it’s actually the square of the absolute value of ci that is a probability. Specifically, the quantity |ci|2 is the probability that the eigenvalue ai is observed in a measurement of the operator A on the state vector |φ>. This is known as the Born rule. Another way of describing this summation equation is to say that the state of the system is a linear combination, or superposition, of all the eigenstates that compose it and that these eigenstates are “weighted” by their respective probability amplitudes. Eigenstates with higher probability amplitudes are more likely to be observed. And this touches again on the idea that observations of certain eigenstates are probabilistic and that’s the reason that the eigenvalues for these eigenstates are not considered definite properties. Because, they’re not definite; they’re probabilistic.

If we apply operator A to state |φ> we have a new vector A|φ>. In our Hilbert space this new vector’s components are expressible in terms of the coordinates, or basis vectors. If the basis vectors are eigenvectors of A then these components are expressible in terms of the probability amplitude ci. We could say that the application of this operator A to vector |φ> extracts ci and multiplies it by the eigenvalue ai. And this is good because remember eigenvalues are what we actually observe in experiments. So now we can express the state of the systems in terms of things we can observe.

This transformed vector A|φ> is equal to the sum of products of eigenvalue ai, amplitude ci, and eigenvector |αi>, for all i.

A|φ> = ∑aicii>

Now we’re ready to get into what Auyang considers what we can properly consider properties of quantum systems. For some observable A and its operator, the sequence of complex numbers {aici} can be called an A-amplitude and is, using the eigenvalues, expressed in terms of the probability amplitude ci. And this is where Auyang locates the properties of quantum systems. She interprets the probability amplitude ci or the A-amplitude as the definite property or the value of a certain quantum system in a certain state for the property type A. And she makes the point that we shouldn’t try to imagine what the amplitudes and A-amplitudes describe because they are nothing like classical feature; “they are literally unimaginable”. But they are calculable. And that’s their crucial, property-type feature.

We might ask why we should locate definite properties in something that we can’t imagine. Classical properties like classical energy, position, and momentum are more easily envisioned, so these prospective, unimaginable quantum properties might seem unsatisfying. But this touches on Auyang’s general Kantian perspective on the sciences, which is that our understanding of scientific concepts relies on a complex underlying conceptual structure. And in this case that underlying conceptual structure includes things like vectors, Hilbert spaces, bases, eigenvectors, eigenvalues, and amplitudes. If that structure is required to comprehend the system it’s not unreasonable that the system’s definite properties would be expressed in terms of that structure.

With that mathematical overview let’s look at the concept of properties more closely and at our expectations of them. And here I’d like to just quote an extended passage directly from Auyang’s book because this is actually my favorite passage:

“In quantum interpretations, the ‘values’ are usually taken to be eigenvalues or spectral values, which can be directly revealed in experiments, although the revelation may involve some distortion so that the veracity postulate does not hold. It is beyond a reasonable doubt that quantum systems generally have no definite eigenvalues. However, this does not imply that they have no definite properties. The conclusion that they have none arises from the fallacious restriction of properties to classical properties, of which eigenvalues are instances. Sure, quantum systems have no classical properties. But why can’t they have quantum properties? Is it more reasonable to think that quantum mechanics is necessary because the world has properties that are not classical?”

“The no-property fallacy also stems from overlooking the fact that the conceptual structure of quantum mechanics is much richer than that of classical mechanics. In classical mechanics, the properties of a system are represented by the numerical values of functions, which assign real numbers to various states of the system. In quantum mechanics, functions are replaced by operators, which are structurally richer. A function is like a fish with only one swaying tail, its numerical value; an operator is like an octopus with many legs. Quantum mechanics employs the octopus with good reason, and we miss something important if we look only at the one leg that reminds us of the fishy tail. Quantum systems generally do not have definite eigenvalues, but they have other definite values. The stipulation that the values must be directly revealable in measurements confuses the empirical and physical meanings of properties.”

“I argue that we cannot give up the notion of objective properties. If we did, the quantum world would become a phantom and the application of quantum mechanics to practical situations sorcery. Are there predicates such that we can definitely say of a quantum system, it is such and so? Yes, the wavefunction is one. The wavefunction of a system is a definite predicate for it in the position representation. It is not the unique predicate; a predicate in the momentum representation does equally well. Quantum properties are none other than what the wavefunctions and predicates in other representations describe.”

And recall here that a wave function is another way of referring to the state of a quantum system. I think of this was moving things up a level. Or down a level depending on how you want to think of it. Regardless, at one level we have the eigenvalues that pop out with the application of an operator on a state vector. These are not definite properties of the system as a whole. In other words, the definite properties of the quantum system do not reside at this level. Rather they reside at the level prior to this, on which these outcomes depend. In the case of an atomically bound electron we could say that it is the orbital, the probability distribution of the electron’s location, that is a property of the quantum system, rather than any particular position. These sorts of properties have a lot more too them. As Auyang says, they are “structurally richer”. They’re not just values. They are amplitudes, from which we derive probabilities for various values. And what Auyang is saying is that there’s no reason not consider that the definite property of the quantum system.

Still, it is different from out classical notion of properties. So what is it that is common to both classical and quantum properties? Auyang borrows a term from Alfred Landé, proposing that a characteristic has empirical ramification if it is observable or “kickable”:

“Something is kickable if it can be kicked and kicks back, or it can be somehow physically manipulated and the manipulation produces observable effects. Presumably the property is remote and obscure if we must resort to the indirect kickability criterion alone. Thus kickability can only work in a well-developed theory in which the property is clearly defined, for we must be able to say specifically what we are kicking and how it is supposed to kick back.”

In the case of quantum properties we are indeed in a situation where the property is “remote and obscure”. But we also have recourse to “a well-developed theory in which the property is clearly defined”. So that puts us in a good position. Because of this it doesn’t matter if properties are easily visualizable. “Quantum properties are not visualizable, but this will no longer prevent them from being physical”. The physical surpasses what we are able to visualize.

So there is a well-developed conceptual structure that connects observables to the definite properties of the quantum system prior to these observables. To review a little how this structure and cascade of connections works:

We start with the most immediate aspect: what we actually observe, which enter into the conceptual structure as eigenvalues. Eigenvalues of an observable can be regarded as labels of the eigenstates. Eigenstates serve as axes of a coordinate system in the state space. This is an important point, so I’ll repeat it again in another way. As Auyang puts it: “An observable coordinates the quantum world in a particular way with its eigenstates, and formally correlates the quantum coordinate axes to classical indicators, the eigenvalues. An observable introduces a representation of the quantum state space by coordinatizing it.” So we have observations to eigenvalues, to eigenstates, to axes in a state space.

The coordinate system in the state space enables us to determine definite amplitudes. The state space is a vector space and any particular state or quantum system in this state space is a vector in this space. We can break this vector down into components which are expressed in terms of the coordinate system or basis, i.e. the eigenstates. This is the coefficient ci, which is a probability amplitude. This is why we’re able to determine definite amplitudes using the coordinate system. A quantum system has no definite eigenvalues but it does have definite amplitudes. When it’s broken down into its basis components a quantum state is series of eigenstate expansion, multiple terms that are added up to define the vector. Each of these terms has an amplitude associated with an eigenstate that is also associated with some observable. Practically, an indicator in the form of an eigenvalue is somehow triggered in measurements and experiments. And the probability of observing any particular eigenvalue will be defined by its amplitude. Specifically, the quantity |ci|2 is the probability that the eigenvalue ai is observed in a measurement of A on the state |φ>. But it is the probability amplitude ci that is the definite property of the quantum system rather than any particular eigenvalue that happens to be observed. What’s more, this is an objective property of the quantum system even in the absence of any experiment. As Auyang puts it: “Unperformed experiments have no results, but this does not imply that the quantum system on which the experiment might be performed has no properties.” Now to show the more complete cascade of kickability: we have physical observations, to eigenvalues, to eigenstates, to axes in a state space, to a state vector, to vector components, to component coefficients, to probability amplitudes. And it’s the probability amplitudes that are the definite properties of the quantum system.

The question of whether or not quantum systems have definite properties is a philosophical question rather than a question of physics, to the extent that those can be separated. It’s not necessary to engage in the philosophy in order to engage in the physics. One can measure eigenvalues and calculate probability amplitudes without worrying about whether any of them count as properties. But it’s arguably part of the scientific experience to step back on occasion to reflect on the big picture. To ask things like, “What is the nature of the work that we’re doing here?”, “What does all this tell us about the nature of reality?”, “Or about the way we conceptualize scientific theories?” For me one of the most fascinating insights prompted by quantum mechanics is of the necessity of the elaborate conceptual structures that support our understanding of the theory. To put it in Kantian terms, these conceptual structures are “transcendental” in the sense that they constitute the conditions that are presupposed and necessary for us to be able to understand the theory in the first place. And to me that seems quite significant.

Spacetime, Individuation, and Fiber Bundles

How can entities be picked out as individual and distinct entities? Sunny Auyang presents a Kantian model of spacetime as an absolute and indispensable structural scheme we project onto the world to organize it and to pick out individual elements, or events in it. Using fiber bundles she packs together a complex structure of individuating qualitative features that she links to individual points in spacetime.

I’d like to talk again about some stuff I’ve been reading in this book by Sunny Auyang, How is Quantum Field Theory Possible? Specifically in this latest chapter I read on the nature of space or spacetime and the possibility of individuation, individuation being the identification and distinction of entities as separate entities.

Both of these issues have a long history in the history of philosophy but Auyang focuses mostly on the work of the modern period of the last few centuries, most especially on Leibniz, Newton, and Kant. There’s a famous dichotomy or division between the models of space put forward by Leibniz and Newton. And the question there is whether space is an independently existing thing or just a way of conceptualizing the relations between actual entities, like their distances and orientations from each other. So Newton’s view was that space has an independent existence. Even if you took out all other entities in the universe space itself would still be there as its own thing. Also time. So both space and time are “absolute”. But for Leibniz these are relative or relational concepts. Lengths, areas, and volumes are relations between entities but if you take away the entities, the actual things there’s nothing left behind, no empty space. Now I’ve read that those are actually drastic simplifications of their views, which doesn’t surprise me. But regardless of that we can at least have those views in mind to start, with the understanding that they’re traditionally associated with Newton and Leibniz. Auyang actually divides both these views further, so that we have four; two Newton-type views and two Leibniz-type views. And I’ll just introduce those so we can use the descriptive names rather than these two proper names.

On the one side we have the substantival view and the absolute view. Spacetime is substantival if it exists independent of material entities. Spacetime is absolute if its concept is presupposed by the concept of individual entities and things. These are similar but slightly different ideas. Substantivalism is ontological, meaning it actually has to do with being, what is. Absoluteness is conceptual; it pertains to the way concepts fit together and what is necessary for certain concepts to work and be intelligible. These can coincide but they don’t have to. And Auyang is going to argue for a model of spacetime that is absolute but not substantival. So in her view spacetime is not a thing that exists independent of material entities but it is a concept that is required to conceptualize material entities.

On the other side Auyang also distinguishes between the relational view and the structural view. I think this is an even more subtle distinction. The difference between these two is a matter of logical priority, looking at what comes first. So recall that with the relational view the concept of space arises from the relations between entities. Dimensions like length, area, and volume are these relations that we perceive between the entities around us. They’re already there and we perceive them. The structural view is the Kantian view, from Immanuel Kant, that space, and we can say also spacetime, are concepts that we project onto the world to organize it bring structure to it. So we as subjects come first. I’m describing that a little differently than she does in the book but that’s the way it makes most sense for me to think about it. And I think it’s consistent with her view. And between these options Auyang is going to argue for a model of spacetime that is structural rather than relational. So it’s more the Kantian model. So bringing these two together her view of spacetime is absolute and structural. In other words, spacetime is a concept that is required for us to conceptualize material entities, and it is a structure that we project onto the world to organize it and make sense of it.

With that in place let’s get to individuation of entities. How do we say that a thing is the same thing across time, something that we can index or label? And how do we say of a thing that it is this thing and not some other thing? “An entity is an individual that can be singly picked out, referred to, and made into the subject of propositions.” Aristotle said that it incorporates two elements. It’s both a this and a what-it-is. These are the notions of individuality and kind. A specific entity is not only a thing but it is this thing. It’s indexed and labeled. It’s also a certain kind of thing. That doesn’t individuate the single entity from other members of that same kind but it distinguishes that class of entities as a kind. Then within that set of that kind of entity they must be further differentiated and identified as individuals. That gets very complex. Other philosophers instead have also argued for the importance of a cluster-of-qualities notion. An entity is no more than the sum of its qualities. If you get specific enough about your qualities maybe that’s all you need. Every entity has a unique spatio-temporal history at least, even if indistinguishable in all other qualities. At least we may so argue. So some important concepts here are individuality, kind, and qualities. These are ways of individuating.

So we’re going to look at a model of these entities. And the first thing to address is that we’re going to look at this through the lens of quantum field theory rather than classical mechanics. So the primary form of matter, the material entities I’ve been talking about before, shift from discrete mass points in space to continuous fields comprising discrete events. Auyang doesn’t mention this but it reminds me a little bit of Alfred North Whitehead’s process philosophy in which he substituted a substance ontology of things to a process ontology of events. Auyang’s quantum field theory is rather different from that, nevertheless, it was something that came to mind. So anyway, the basic entities we’re going to consider now are events.

A field is a continuous system. “The world of fields is full, in contrast to the mechanistic world, in which particles are separated by empty space.” Every point in a field is assigned a value. So say we have a field, that we’ll call the greek letter ψ, for every point x in that field there will be a value ψ(x). And that field variable ψ(x) doesn’t have to be scalar, i.e. just a number. It can be a vector, tensor, or spinor as well. Actually I’m most accustomed to thinking of field variables as vectors like with a gravitational field or an electric field. So with a gravitational field for instance every point in the field around mass M has a vector oriented toward mass M. And then the magnitude of those vectors varies with the distance from mass M. And that’s just an example, the field variable could be any number of things. And that’s important for individuation because we’re going to want to account for the qualities of an individual event with which we can distinguish it. But also one key idea to keep in mind is that the field variable ψ is indexed to some point x in the field. That’s another method of individuation.

So let’s look at how both qualities and numerical identity get taken up in Auyang’s model. To give a bit of a road map before diving into the details her model will include. She’s going to use 6 major pieces: D, G, π, M, x, and ψ.

D is what’s called the total space.
G is a local symmetry group.
π is a map.
M is a base space.
x is a position in the base space M.
And ψ(x) is an event.

All of this will be put together in a fiber bundle structure. And we’ll get into what all that means in a minute.

First let’s talk about symmetry groups, which will be this G in her model. The concept of the this-something, the individuality of events, is incorporated in field theories through two symmetry groups. Symmetry is a key idea in physics. A related term is invariance, also a very important concept. And it’s basically what it sounds like. It’s some property that doesn’t change. More specifically, we’re interested in the very particular circumstances under which it doesn’t change, called transformations. So you have some object, you transform it in some way – say you rotate it for example – the features that don’t change in that transformation are invariants. And this can tell us important things. The big conservation laws in physics come from invariants as we know from what is called Noether’s Theorem. For example, conservation of energy comes from time invariance. Conservation of momentum comes from translational invariance. Conservation of angular momentum comes from rotational invariance. Very significant. Okay, so backing up again to symmetry groups – that was the whole reason for getting into this. A symmetry group is the group of all transformations under which the object is invariant. Some objects have lots of symmetry – they’ll be invariant under many transformation – others have very little. But the key is that the group of all those transformations where it is invariant – that’s a symmetry group.

The two symmetry groups pertinent to the field theories here are the local symmetry group and the spatio-temporal symmetry group. And these embody different aspects of the individuation of entities. “The idea of kinds is embodied in the local symmetry group, which pertains not to spatio-temporal but to qualitative features. The symmetry group circumscribes a set of possible states and defines a natural kind.” So recall one of the important ideas for identification or individuation was quality. Well the state of an entity covers its qualities. But for localization and identification, its numerical identity, we need a global whole, rather than a local whole, and that is represented by a spatio-temporal symmetry group. “The identities of the events are the invariants in the spatio-temporal symmetry structure.” These two symmetries give us the quality and numerical identity of the entities.

To fit this all together Auyang presents a model for the structure of local symmetries. And she does this using fiber bundles. Fiber bundles are great mathematical tools. The most straightforward way I like to think about fiber bundles is that they are a way to relate single points in some base space to more complex structures in another space. And when I say “space” here these can be abstract spaces, though at least one of these in what follows, the base space, will in fact be a spatio-temporal space. The great thing about this is that it lets us sneak a lot of structure into a spatio-temporal position. And that’s good because we need a lot of structure for these individuating elements. A spatio-temporal position is just one of these individuating elements. We want to bring qualities in there too.

So let’s look at Auyang’s model. This is the featured image for this episode by the way if want to look at it. The objects D, G, and M are differential manifolds, which is basically just a kind of space or surface. These manifolds can be actually spatial or spatio-temporal, which will be the case with our base space M. But they can also be, and often are abstract, which will be the case for our total space D and our local symmetry group G in this model. The first manifold, our total space D, is a set of abstract qualities. So this is where we’re going to get the qualities for our entities from. Then she also has a local symmetry group, G, which is also a manifold. We can label the abstract qualities in D as θ, θ’, and so forth. “At this starting point, both D and G are abstract and meaningless. Our aim is to find the minimal conceptual structure in which we can recognize events as individuals”.

The symmetry group G acts on the total space D and collects subsets of elements in D that are equivalent to each other. Each of these subsets we’ll call a G orbit. The elements in a single G orbit are equivalent to each other. We can start with quality θ and θ’ – those will go into one G orbit. Then we can pick out ξ and ξ’. This divides D up into these G orbit subsets until all elements in D are accounted for. None of resultant G-orbits share common elements. D still has all the same elements as before but they are divided into these subsets. This is quite useful for our purposes of individuation. We have some organization here of all this information.

Next we can take a G orbit and introduce a map π that sends all elements in a G orbit, θ,  θ’, for example, sends all those elements onto a single point x. This point x is on another manifold M, a base space. There’s also an inverse map, π-1, that canonically assigns a unique element x in M to each G orbit in D. M is what’s called a quotient of D by the equivalence relation G. It’s not given in advance but falls out from D. Every spacetime point, x, in the spatio-temporal structure, M, is associated with an event, ψ(x), in the total space D. Speaking of this in terms of set theory, D becomes a set with an indexing set M.

So now we have all the pieces put together: D, G, π, M, x, and ψ. And to review, D is the total space, G is a local symmetry group, π is a map, M is a base space, x is a position in the base space M, and ψ(x) is an event. And what’s the significance of all this in the “real world”, so to speak? M is usually called spacetime and x is a point in spacetime, the spatio-temporal position of an event ψ(x). But the identity of an ψ(x) includes more than just it’s spatio-temporal position, even though it’s indexed to that position. All that extra information is in the total space D. It’s divided up by the local symmetry group G. And then it’s mapped onto the spacetime base space M by the map π. The cool thing about the fiber bundle is that it allows us to cram a lot of information into a single point in spacetime, or at least link it to a lot of extra information.

The main goal that Auyang is working toward with this model is individuation. And to do that she needs enough complexity to carry the kind and quality features of individual entities, as well as spatio-temporal position. What happens in this model is that a spacetime position, x, signifies the identity of an event ψ(x). x uniquely designates ψ(x) and marks it out from others. The symmetry group, G, whose features are typical of all ψ(x), signifies a kind; since it collects those features as group. Then the spatio-temporal structure, M, is a system for identifying individuals in that group. So this “sortal concept that individuates entities in a world involves two operations” that will mark out (1) kinds and (2) numerical identity. First the local symmetry group, G, forms identical equivalence classes of qualities for this notion of kinds. Second the projection map, π, introduces numerical identity for each of these equivalence classes. These together secure the individuality of an event, ψ(x).

One thing we can certainly say about this kind of model is that it is analyzable. Events and spacetime positions are not just given in this view. There’s complex interplay between spacetime positions of events and all the qualities of those events. This is what we get with field theories. Even if we look at the world in the most primitive level, as Auyung says, with field theories, “to articulate even this primitive world requires minimal conceptual structure more complicated than that in many philosophies, which regard sets of entities as given.” So is this necessary, are we just making things more complicated than they need to be? Quoting Auyang again: “Field theories have not added complications, they have made explicit the implicit assumptions taken for granted.” I’m not prepared to defend that point but I’m fine with going along with it for the time being.

To wrap things up let’s look at some ways for thinking about this spatio-temporal structure, M. The complexity of the full conceptual structure of this model (D, M, G, π) is what makes it analyzable and it enables us to examine M’s possible meaning. Auyang characteristically promotes a Kantian take on all this. This is to see M as a “scheme of individuation and identification that we project into the world via the inverse map π-1 and by which we present the world to ourselves as comprising distinct entities.” Recall that in Kant’s thought the world is intelligible to us only because we apply categories of understanding to the raw sense data we bring in, and we use these categories to organize it all and make sense of it. Auyung is saying that this is what M does; this is what the spatio-temporal structure, or our concept of spacetime does.

And this idea of space being what individuates things has a long history. For example, speaking of Kant, in Kant’s philosophy space is what makes identity and difference possible. Hermann Weyl called space the “principium individuationis”, which is really fun to say with the classical Latin pronunciation of the ‘v’. But that’s just this idea we’ve been talking about, individuation, the manner in which a thing is identified as distinguished from other things. Weyl also said space “makes the existence of numerically different things possible which are equal in every respect”. So it’s not just the qualities (non-spatial) that are important. You need space to distinguish entities that are otherwise identical. This doesn’t mean that space is substantival, some independently existing substance. But it is conceptually indispensable. So, say it is something that we bring to the scene, something we impose as an organizing tool. It’s still indispensable for the possibility of individuation. So it’s absolute in that sense.

So to review, I’ll put these in Kantian terms. We start off with what is “out there”, just this pre-conceptualized mass of stuff, our total space D. How is that intelligible? We come at it via a conceptual structure, the mental categories of space and time, or spacetime, M. Then we project these spatial and temporal conceptual categories onto the world using the inverse map π-1. This inverse map is able to pick out individual entities in the total space D that are distinguishable by an organizing operation of the local symmetry group G. The local symmetry group G has divided up the total space D into G-orbits with common elements. Our spatial and temporal categories pick these subsets out as events ψ(x) that are mapped onto spacetime M. And that brings the whole structure together in a way that we can see everything together and pick out individual events as individual elements.